vumm | 
METHODS 


PSYCHOLOGICAL 


RESEARCH | 


JAMES E WERT: 9 


CHARLES 0..NEIOT 
1 STANLEY AHMAKN 


Ee 
a ZS om N 
Ka 
D 
“ f 4 
A > 
. Ad i 
“ d 7 
, ‘ KE 
vë, 
Ke 
u H 
i 
ae ` 
` 


. 
STATISTICAL METHODS 


in Educational and 


m 


STATISTICAL METHODS 


in Educational and 


Psychological Research 


By 

JAMES E. VVERT 

lowa State College 
CHARLES O. NEIDT 
University of Nebraska 

J. STANLEY AHMANN 


Cornell University 


Bureau Edni,”s57. Research 
DAVID jn T AINING COLLEGE 
Dab Od A O e 


AS 


a o A 


E CALAS CRIMEN 


New York 
APPLETON-CENTURY-CROFTS, INC. 


CoPyrIGHT, 1954, BY 
APPLETON-CENTURY-CROFTS, INC. 


All rights reserved. This book, or parts 
thereof, must not be reproduced in any 
form without permission of the publishers. 


524-1 


Library of Congress Card Number: 54-5149 


SN 
WEF 


PRINTED IN THE UNITED STATES OF AMERICA 


Preface 


The purpose of this book is twofold. First, we have attempted to illus- 
trate the application and interpretation of those statistical methods which 
are most useful in the solution of research problems in education and psy- 
chology. Second, we have aimed to provide a background of techniques 
upon which advanced courses in statistical theory and methodology may 
be based. 

A lack of college mathematics does not preclude use of this volume. The 
mathematical derivation of formulas may well be undertaken subse- 
quently to, rather than concurrently with, the study of the statistical 
methods here described. Since we feel that the first courses in statistical 
methodology should emphasize the interpretation and application of tech- 
niques, we have included many illustrations of actual research problems. 
Some hypothetical problems are offered when their use contributes to 
sounder pedagogy and increased understanding. 

The contents of this volume are designed for a two-semester or three- 
quarter sequence. A choice of contents can be made for courses of shorter 
duration. In general, the first eight chapters, portions of Chapter 9, “Chi 
Square,” and all of Chapter 10, “Analysis of Variance—Single Classifica- 
tion,” are suggested as a basis for an introductory one-term course. 

Throughout the book we stress modern statistical methods, and some 
new techniques are proposed. The contributions of R. A. Fisher, applied 
in the work of George W. Snedecor and his associates, are especially 
noted. 

The authors are indebted to Professor Sir Ronald A. Fisher, Cambridge, 
to Dr. Frank Yates, Rothamsted, and to Messrs. Oliver and Boyd Limited, 
Edinburgh, for permission to reprint Table No. III, Distribution of t, and 
Table No. IV, Distribution of x2, from their book “Statistical Tables for 
Biological, Agricultural, and Medical Research”; to Professor George W. 
Snedecor, Iowa State College, and to the Iowa State College Press for 
permission to reprint Table 10.7, The 5 and 1 per cent Values of F, from 
the book “Statistical Methods” (Fourth Edition), 1946; and to the 


v 


vi PREFACE 


McGraw-Hill Book Company for permission to use certain ideas and 
examples used in an earlier book by one of the authors. Appreciation is 
also extended to the many graduate students at Iowa State College, the 
University of Nebraska, and Cornell University for their criticism of the 
mimeographed editions of the manuscript. 

During the ten-year period in which this book has been in preparation, 
certain individuals have contributed so much that they deserve special 
notice. These men are: Dr. Alonzo Myster, Professor of Experimental 
Statistics, Virginia State College; Dr. Orlando Kreider, Assistant Pro- 
fessor of Mathematics, Iowa State College; Dr. John E. Bicknell, Director 
of Research, Iowa State Education Association; Dr. Eli A. Zubay, Asso- 
ciate Professor of Actuarial Science, Drake University; Dr. John M. 
MacRae, Assistant Professor of Psychology, University of Omaha; Dr. 
Arthur Gowan, Registrar, Iowa State College; Dr. Willard H. Nelson, 
Assistant Professor of Psychology, Alabama Polytechnic Institute; Dr. 
William M. Slaichert, Director of Research, Iowa State Department-of 
Public Instruction; Dr. John P. Malloy, Acting Director, Guidance and 
Placement Center, Marquette University; Dr. W. A. Hunter, Assistant 
Professor of Education, Tuskeegee Institute; Dr. Earle Canfield, Asso- 
ciate Professor of Mathematics, Drake University; Dr. Arthur Williams, 
Associate Professor of Chemistry, San Jose State College; and Dr. Wilbur 
Sprain, Assistant Professor of Chemistry, San Jose State College. 

Throughout the final stages of the preparation of this manuscript, the 
help and stimulation of Mrs. Jeanne Coltrane have been invaluable, far 
beyond the call of duty of a secretary-computer. 


J. E. WERT 
C. O. NHIDT 
J. S. AHMANN 


SS 


de 


Contents 
—— 
Preface 
CHAPTER 
1. CLASSIFICATION AND PRESENTATION OF RESEARCH INFORMATION 
2. MBASURES OF CENTRAL TENDENCY . 
3. QUARTILES, DECILES, AND PERCENTILES . 
4. MEASURES OF VARIABILITY 
5. COEFFICIENT OF CORRELATION 
6. CLASSICAL THEORY OF SAMPLING 
7. STATISTICAL INFERENCE—ESTIMATION 
8. STATISTICAL INFERENCE—TESTING HYPOTHESES 
9. Cur SQUARE E Ma ee 
10. ANALYSIS OF Sue Hate CLASSIFICATION . 
11. ANALYSIS OF VARIANCE— MULTIPLE CLASSIFICATION 
12. ANALYSIS OF VARIANCE—DOUBLE CLASSIFICATION CORRECTION 
FOR DIsPROPORTIONALITY 
13. LINEAR REGRESSION : e tE 
14. SERIAL CORRELATION AND eut ANALYSIS 
15. NONLINEAR REGRESSION i GE 
16. OTHER TECHNIQUES OF CORRELATION Genie 
17. STATISTICAL TECHNIQUES IN MEASUREMENT 
18. ANALYSIS OF COVARIANCE . a 
19. FURTHER APPLICATIONS OF DISCRIMINANT ¡a A 


APPENDIX A. DETERMINATION OF REGRESSION COEFFICIENTS 
APPENDIX B. TABLES . 
Index 


vi 


PAGE 


103 
123 
146 
172 
188 


211 
226 
256 
282 
294 
318 
343 
364 
389 
393 
431 


STATISTICAL METHODS 
in Educational and 


Psychological Research 


1 


Classification and Presentation 


of Research Information 
Be I, a 


The past fifty-year period has revealed a substantial advancement in 
the development of scientific method in education and psychology. The 
emphasis on the collection and analysis of evidence demanded by the 
scientific method has created an increased necessity for statistical meth- 
odology in research in these fields. Contrary to popular opinion, statis- 
tical methodology is not an occult activity of a mathematical magician. 
Rather it is concerned with numerical statements of evidence. Thus 


statistical method is a fundamental tool which cannot be disregarded by 
analysis, summarization, and inter- 


a research worker in the assembly, $ ation 
pretation of the evidence which is so essential for scientific progress. 


NUMERICAL EVIDENCE 
evidence may be assembled from research situations con- 
cerning almost all characteristics of people, places, times, and things. 
Observation of a number of cases with respect to any given characteristic 
yields evidence of individual differences of magnitude or of kind. Thus, 
people differ in weight, height, age, intelligence, interests, and aptitudes. 
These characteristics yield differences in magnitude. Any characteristic 
which yields individual differences of magnitude is here defined as a 
variable characteristic. Such characteristics are also referred to as con- 
tinwum characteristics. Observations of variable characteristics for a 
group of individuals result in measurement data. Measurement data can 
be ordered according to magnitude, and any individual may differ from 


another by a certain amount. 
Individual differences also 00 


Numerical 


1 


2 STATISTICAL METHODS 


yields individual differences of kind, is here defined as a nonvariable 
characteristic or a noncontinuum characteristic. Observations of noncon- 
tinuum or nonvariable characteristics result in enumeration data. In the 
assembly of such data the individuals are classified into categories and 
the number of cases in each category is counted. The categories of a 
classification based upon a nonvariable characteristic cannot be ordered 
according to magnitude. Each individual observed should be classifiable 
into only one category, and sufficient categories should be available to 
accommodate all individuals observed. 

Variable characteristics predominate when research procedures involve 
human subjects. Achievements, aptitudes, appreciations, and interests 
are examples of variable characteristics. Since individuals differ in the 
amount of these characteristics possessed, the characteristics can be meas- 
ured in a graduated manner. In many measurements such as test scores, 
the units are numerical. In other situations, descriptive units such as 
excellent, good, fair, and poor may be used. If desirable for statistical 
treatment, these descriptive units may be transmuted to numerical values. 

The problem of precise measurement of variable characteristics is as 
acute in education and psychology as it is in the physical sciences. It 
should be pointed out that no measurement is absolutely accurate. The 
chemist with the most sensitive analytical balance must estimate weight 
to some fractional part of a milligram. In a similar manner, perhaps with 
less confidence, letter marks are assigned to students in different courses. 
The inability to obtain precise measures of human characteristics is a 
limiting factor whenever the purpose is for counseling an individual, but 
is a consideration of less importance in research studies involving groups 
of individuals. Generalizations may be drawn concerning group reaction 
which are entirely tenable for a group but which would be extremely 
dubious if applied to any given individual within the group. 


IMPORTANCE OF DEFINING TERMS 


One useful method of summarizing evidence consists of assembling data 
in tabular form such as that shown in Table 1. At first thought the classi- 


TABLE 1. Sex and Grade Level of College Students 


SEX FRESHMAN SOPHOMORE JUNIOR SENIOR TOTAL 
Male 924 831 792 760 3307 
Female 815 736 687 653 2891 
Both 1739 1567 1479 1413 6198 


fication of any given student seems obvious, but such is not the case, 
What constitutes a freshman? How much credit must a student accumu- 


RESEARCH INFORMATION 3 


late before he is classified as a sophomore? Arbitrary decision must nec- 
essarily be made by defining terms. Any research study must report the 
definitions which have been made. Such definitions should be explicit 
enough that any given individual in the study can immediately be classi- 
fied. Failure to provide definitions leads to the critical comment that 
there are prevaricators, liars, and statisticians, presumably in the order 
stated. For example, two universities, having approximately equal total 
enrollment, may report freshman enrollment in the ratio of 3 to 1. The 
apparent discrepancy lies in the definition of freshman. In one school, all 
transfer students may be reported as freshmen. In the other school, these 
students may be listed in the grade level suggested by an evaluation of 
their transfer credit. In one school, a freshman may be defined as any 
student who has not as yet completed one-fourth of the required courses 
for graduation, whereas in another school, the cutting point between a 
freshman and a sophomore may be somewhat lower. 

The necessity of a uniform definition of terms for comparison of two 
groups on any classification is paramount. For example, if school children 
are to be classified as retarded, accelerated, and normal, it is necessary 
that these three terms be so defined that whenever any given pupil is 
considered, it is evident at once into which group he should be classified. 
The importance of satisfactory definition in research studies cannot be 


overemphasized. 


CONTINGENCY TABLES 


Tables in which the entries are numbers of cases resulting from a two- 
way classification of characteristics, such as shown in Table 1, are called 
contingency tables. In reporting evidence in a contingency table, per- 
centages are often computed for convenience in interpretation. The use 
of percentages is never justified except when the number of cases in a cell 
cannot be directly compared. For example, the reporting of percentages 
of men and women who are freshmen, sophomores, juniors, and seniors 
would not be justified if the total number of men and the total number 
of women were equal. 

It is possible in any two-way contingency table to compute percentages 
in either direction. Thus, in Table 1, the percentages of men and women 
who are freshmen, sophomores, juniors, and seniors may be obtained, or 
the percentages of each class who are men and women may be found. 
The decision concerning which way to compute percentages should be 
governed by the interpretation desired. If no interpretation of the evi- 
dence in the table is to be made, the computation of percentages, or any 
other statistical measure, is unimportant if not absurd. 

It should be further noted that the computation of percentages in 
which the divisor is less than 100 yields results which can be misleading 
to a reader. For example, one department in a university reported an in- 


4 STATISTICAL METHODS 


erease of 1600 per cent over its previous year's number of graduates. At 
first thought such a value might be interpreted as an increase of almost 
phenomenal size. A much more revealing interpretation can be made, 
however, when it is pointed out that there was only one graduate of this 
department the first year and 17 the second year. 

In many investigations, considerable evidence is assembled concerning 
some numerically expressed variable. The unarranged data when listed 
provide only the vaguest idea of numerical size or order. The assembled 
data take on meaning only after some classification, or summarization, 
has been made. One method of organizing measurement data consists of 
the preparation of a frequency distribution. The frequency distribution 
may be presented either in a table or in a graph. 


PREPARATION OF FREQUENCY DISTRIBUTIONS 


The unarranged vocabulary test scores shown in Table 2 will be used 
to illustrate the steps involved in preparing a frequency distribution. An 
inspection of the unarranged scores reveals that the highest score is 59 
and the lowest score is 12, a range of 47 points. In order to group the 
scores within this range into intervals, arbitrary divisions must be made. 
It would be possible to group into one interval all scores from 12 to 14, 


TABLE 2. Unarranged Vocabulary Test Scores 


from 12 to 20, 12 to 32 or any of many other interval sizes, depending 
upon the number of intervals into which the scores are to be divided. 
The first step in preparing a frequency distribution from measurement 
data, therefore, is to decide upon the number of intervals to be used. The 
number of intervals depends primarily upon the purpose of the classifica- 
tion. Whenever the data are to be statistically analyzed after the fre- 
quency distribution has been constructed, it is customary to choose an 
interval size such that the number of intervals resulting is between 10 
and 20. If the number of intervals is less than 10, considerable loss of 
the exactness of the original data ensues. If the number of intervals is 
greater than 20, little advantage in interpretability or ease of computa- 
tion is gained over classifying the original data by single units. Dividing 


RESEARCH INFORMATION 5 


the range by the proposed number of intervals and rounding this value 
to the nearest whole number will yield a class interval height which will 
accomplish the desired result. It would, of course, be possible to select 
the interval height by dividing several proposed interval heights into the 
range and then selecting a result as close as possible to the desired num- 
ber of class intervals. The latter method can be used effectively whenever 
it is advantageous to use an interval height which possesses some desired 
characteristic such as divisibility by ten or by five. For the data shown 
in Table 2, a decision of dividing into about 10 groups suggests, from the 
range of 47, an interval height of 5 points. 

Whenever the size of the interval has been determined, it is ordinarily 
maintained throughout the entire distribution. For example, it is imprac- 
tical to have a distribution in which the intervals are partly 5-point in- 


TABLE 3. Work Table for Frequency Distribution of Scores on a 
Vocabulary Test 


a KK KK—Á OOOOOOSOSOUOġöSůS 


SCORE INTERVALS TALLY FREQUENCY 
5054 Ly 8 
50-54 
45-49 DU DI IM p 
40-44 ai MX Im W w a 
35-39 m Im DW DW WO W DH 
30-34 mW MX MM IW N I 3 
25-29 mM D WO OTH i 
20-24 D HI Š 
15-19 DI ill PË 
10-14 WI 

Total 162 


tervals and partly 7-point intervals. Stated otherwise, it is desirable to 
have the size of the interval constant throughout the distribution. It is 
somewhat easier to check the accuracy of the limits used in the intervals 
of the distribution when the lowest value in any interval is divisible by 
the size of the interval. Thus, it is somewhat easier to check the accuracy 
in this distribution if the lower limit of the lowest interval is started at 
10. Not only is this value divisible by 5, but the lower limits of the other 
intervals are also divisible by 5. In Table 3, the bottom interval has a 
lower limit of 10 and the top interval a lower limit of 55. 3 
After the intervals encompassing the distribution have been established, 
each score is tallied in its appropriate interval in a work table similar to 
Table 3. The frequency of cases in each interval is then obtained by 


counting the corresponding number of tallies. 


6 STATISTICAL METHODS 


PRESENTATION OF FREQUENCY DISTRIBUTIONS 
IN TABLES 


A table for reporting a frequency distribution does not include the 
tally entries shown in Table 3. It must include, however, headings indi- 
cating interval limits and frequencies as shown in Table 4. It may also 
include, depending upon the purpose, entries for percentages or cumulative 
percentages. 

The description of an interval limit made for the work table shown in 
Table 3 designated the lowest and highest reported scores which might 
occur in that interval. This description is recommended for presenting 
a frequency distribution, and in the interest of feasibility and accuracy, 
is almost essential in its preparation. In presentation, however, the mid- 
point, or only the lower limit may be shown. It is also possible that for 
any of the foregoing methods of presenting interval limits, the reported 


TABLE 4. Frequency Distribution of Scores on a Vocabulary Test 


—————— 


CUMULATIVE FREQUENCY 


FREQUENCY FROM BOTTOM FROM TOP 
SCORE PER PER PER 
INTERVALS F CENT f CENT 2 CENT 
(1) (2) (3) (4) (5) (6) (7) 
55-59 3 2 162 100 3 2 
50-54 8 5 159 98 1 7 
45-49 15 9 151 93 26 16 
40-44 26 16 136 84 52 32 
35-39 35 22 110 68 87 54 
30-34 31 19 75 46 118 . 73 
25-29 20 12 44 27 138 85 
20-24. 12 8 24 15 150 93 
15-19 8 5 12 7 158 98 
10-14 4 2 4 2 162 100 


and theoretical limits do not coincide. This failure of agreement will re- 
ceive further consideration in terms of precision of reported units of 
measurement. 

It is not unusual, in reporting the characteristics of a set of variable 
data, to compute percentages. This procedure often facilitates the de- 
scription of the distribution, particularly to the lay person. Sometimes 
the description can best be made by the computation of the percentage 
of cases within each interval and sometimes by the computation of the 
cumulative percentages either from the top or from the bottom of the 
distribution. The information assembled in Table 3 may then be reported 
as shown in Table 4. In any given table, not all of these percentage items 
should be included. The first column, (1), obviously must be included, 


| 
H 


RESEARCH INFORMATION 7 


although designations for the interval limits may be used other than the, 
one here shown as the lowest and highest reported score. The number of 
cases in each interval, (2), appears in all frequency distributions. The 
percentage of cases in each interval, (3), may or may not be of sufficient 
importance for describing a distribution. It is laboring the obvious to in- 
clude such a table entry whenever the total number of cases in the dis- 
tribution is 100. This column entry is omitted, likewise, whenever one or 
the other of the cumulative percentages better serves the purpose of 
interpretation. 


PRESENTATION OF FREQUENCY DISTRIBUTIONS 
IN GRAPHS 


A graph of a frequency distribution, for some purposes, may be more 
useful than a table for describing the distribution. The information 
shown in Table 4 is shown graphically in Figure 1. The values of the 


40 


Frequency 
N w 
CH CH 


DI 
o 


% 5 10 15 20 25 30 35 40 45 50 55 60 65 70 


Vocabulary Score 
Fro, 1. Frequency Distribution in a Frequency Polygon. 


vocabulary scores are shown on the base line or X-axis, and the fre- 
quency of cases on the vertical axis, or Y-axis. When plotting the fre- 
quency for any given interval, the frequency is plotted above the mid- 
point of that interval. 

In Figure 2 a smoothed curve has been passed among the plotted 
points. This procedure is often followed when the data are few. The 
assumption in this case is that if the data had been more numerous, the 
plotted figure would probably approach this smoothed curve in appear- 
ance. It should be apparent that a smoothed curve cannot be obtained by 
plotting any data directly, unless the impossible condition is satisfied 
that infinite data are utilized. The usual procedure for obtaining the ap- 
proximate points through which the smoothed curve is passed is as fol- 


8 STATISTICAL METHODS 
40 


Frequency 
8 8 


ha 
o 


0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 
Vocabulary Score ` 


Fia. 2. Smoothed Frequency Distribution. 


lows: for each interval along the base line, a point is plotted above the 
mid-point of that interval which has been obtained by first finding the 
total of the frequencies in that interval and in the two adjoining inter- 
vals, the one above and the one below, and then dividing this total by 
three. For example, to find the point for the smoothed curve to plot 
above the mid-point of the 20-24 interval in Table 3, the frequencies 8, 
12, and 20 are summed and the result, 40, is divided by 3. Thus the point 
for the smoothed curve above this interval is 13.33. In a similar manner 
the point to be plotted above the mid-point of the 55-59 interval is 
found to be 3.67. 

Another graphic method of presenting the same frequency distribution 
is shown in Figure 3, in which rectangles replace the plotted points used 
in a frequency polygon. A graph of this type is called a histogram. 


40 


Frequency 
3 8 


— 
o 


0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 
Vocabulary Score 


Fiq. 3. Frequency Distribution in a Histogram. 


RESEARCH INFORMATION 2 


Another type of graphic representation sometimes used in education 
and psychology results from plotting the cumulative frequency distribu- 
tion. A cumulative frequency curve so formed is called an ogive. In col- 
umn (4) of Table 4, the cumulative frequency is shown. According to 
this table, 4 cases occur below 15; 12 below 20; 24 below 25; and so on. 
Figure 4 is then plotted from these cumulative frequencies by plotting 
the corresponding number of cases above the upper limit of each interval, 
i.e., at 15, 20, 25, and so on. Probably the greatest use of this type of 
curve is found in computing percentile values.! When this is the purpose, 


160 100 
140 90 
80 
120 
a 7 2 
2.100 në 
E 80 50 E 
3 60 40 2 
5 30 
9 40 
20 
20 o 
0 


5 10 15 20 25 30 35 40 45 50 55 60 
Vocabulary Scores 


Fiq. 4. Ogive from Table 4. 


the distance occupied by the ordinate of the curve on the right-hand side 

of the figure is divided into 100 equal parts, as in Figure 4. In this man- 

ner, percentile ranks may be read directly from the graph of cumulative 
frequencies. 

Whenever data are presented in a graph rather than in a table, loss 
in accuracy ensues. However, a gain in easy understandability is made 
which is especially important for the lay person. It may be true that 
graphs have been over-used in scientific reports intended for research 
workers and under-used in reports intended for enlightening the general 
public. 


TWO-WAY DISTRIBUTIONS 


In certain studies, the relationship between pairs of measurements can 
be shown in a two-way frequency distribution, or correlation chart. The 


scores on the American Council on Education Psychological Examination 
and first-quarter average marks for 260 engineering students are shown 


1 Percentile values are discussed in Chapter 3. 


10 STATISTICAL METHODS 


in such a two-way frequency distribution in Table 5. The entries in this 
table represent the number of students who obtained test scores within 
the interval shown and received average marks within the corresponding 
interval. Thus 15 students obtained test scores from 120 to 129 and re- 
ceived average marks from 2.4 to 2.7. An inspection of this table sug- 
gests some relationship between these A.C.E. scores and achievement 
averages among engineering students. Appropriate statistical analysis 
will be shown later. The use of percentage values to indicate the rela- 
tionship in a two-way distribution is more amateurish than research 
workers would employ. 


Tase 5. First-Quarter Average Marks and Scores on American Council 
on Education Psychological Examination 


FIRST-QUARTER AVERAGE MARK 


A.C.E. 00- 04 O8 12 16 2.0- 24 28 3.2- 3.6- 
SCORE 03 07 11 15 19 23 27 31 35 39 TOTAL 


160-169 3 3 
150-159 1 2 1 4 
140-149 1 ds oS 2 3 10 
130-139 1 1 1 2 4 4 5 8 26 
120-129 1 3 10 15 6 6 5 46 
110-119 1 5 8 7 B 4 1 2 41 
100-109 4 4 U 7 10 7 1 44 
90-99 1 3 1 7 3 8 10 4 4 41 
80-89 1 2 7 3 7 2 8 25 
70-79 2 2 2 1 2 2 11 
60-69 1 1 3 5 
50-59 1 1 2 
40-49 2 2 
Total 5 9 10 29 31 44 55 33 21 22 260 


Another type of two-way distribution, which is of particular impor- 
tance in economics, is the time series. A tabular example of this type of 
classification is shown in Table 6 and the same example is shown graph- 
ically in Figure 5. The percentage of Iowa schools in which superintend- 
ent turnover occurs is shown for the years 1905 to 1952. The long-time 
trend for a decreasing turnover seems obvious. The fluctuations due to 
periods of prosperity-depression and war could be made almost indiscern- 
ible by shrinking or expanding of the codrdinates employed in construct- 
ing the graph. In any case, the turnover of superintendents in Towa 
school districts maintaining high schools has decreased, but the rate of 
decrease is gradually becoming smaller. 

Whenever a time series is reported, the most useful interpretation oc- 
curs when estimates are extrapolated into the future. Such inferences, 
although open to serious question, represent in many cases the best esti- 


RESEARCH INFORMATION 11 


Taste 6. Percentage of Turnover of Superintendents in 
Towa School Districts Maintaining High Schools 


ee 


PER CENT PER CENT 

YEAR TURNOVER YEAR TURNOVER 
1905 40.9 1930 27.2 
1906 36.9 1931 21.7 
1907 42.4 1932 18.3 
1908 45.3 1933 16.9 
1909 43.3 1934 19.2 
1910 44.1 1935 20.8 
1911 45.6 1936 21.2 
1912 43.2 1937 22.7 
1913 43.2 1938 20.4 
1914 42.8 1939 19.6 
1915 36.1 1940 17.5 
1916 34.8 1941 18.6 
1917 38.8 1942 28.2 
1918 52.1 1943 36.1 
1919 44.3 1944 35.7 
1920 45.3 1945 30.5 
1921 418 1946 30.7 
1922 38.5 1947 32.9 
1923 34.2 1948 29.6 
1924 34.6 1949 22.5 
1925 28.8 1950 21.0 
1926 26.8 1951 23.6 
1927 24.4 1952 19.8 
1928 25.5 


SA 
S 0 40 5 
S 8 
E 5 
= 30 30% 
E 
Š é 
3 o 


N 
O 


m 
a 


1935 1940 1945 1950 


a School Superintendents. 


1905 1910 1915 1920 1925 1930 


Fiq. 5. Graph of Percentage Turnover of low. 


12 STATISTICAL METHODS 


mates of future conditions. Public utilities have utilized this type of 
analysis, projecting supply and demand as many as thirty years into 
the future. In education, the school building programs in many cities 
have been partially determined by extrapolation from time series anal- 
ysis. Time series analysis, as here suggested, is not included in this book. 
The interested research worker will do well to consult some textbook in 
statistical method particularly focused on economics. 


THE NORMAL CURVE 


The distribution assumed to characterize most educational and psycho- 
logical variables is the familiar normal curve shown in Figure 6. A nor- 
mal curve is a symmetrical, bell-shaped curve, high in the center. From 
the center, the height of the curve decreases slowly at first and then more 
rapidly until it is about three-fourths of its original height, after which 


Frequency 


Size of Variable 
Fic. 6. The Normal Curve. 


the rate of decrease becomes smaller and smaller until it is imperceptible. 
The assumption that most human traits tend to follow an approximate 
normal distribution has been widely accepted, since little evidence to 
the contrary has been found. On the other hand, a normal distribution of 
all educational or psychological data cannot be assumed. As an example, 
the distribution of size of municipalities in a state is known to be non- 
normal. 

It is of course possible that a normal distribution of a trait may appear 
to be nonnormally distributed if evaluated by a test which is too easy or 


Frequency 


4 


Size of Variable 
Fic. 7. Nonnormal Distribution Reflecting Negative Skewness to the Left. 


RESEARCH INFORMATION 13 


too difficult. In the one case the curve will be skewed to the left as in 
F igure 7 and in the other case to the right as in Figure 8. The distribu- 
tion of test scores does not yield conclusive evidence of normality or non- 
normality in the characteristic to be evaluated. It has been assumed 


FreqUericy 


Size of Variable 
Fic. 8. Nonnormal Distribution Reflecting. Positive Skewness or Skewness to the 
Right. 
human traits tend to follow a 
f scores fails to follow a normal 
any purposes may be open to 
been normalized. 


whenever necessary in this book that 
normal curve and, if the distribution 0 
curve, the instrument of evaluation for m 
question unless and until these scores have 


NUMERICAL UNITS 


Most educational and psychological tests yield numerical scores. The 
humerical units comprising these scores are of several different kinds. For 
example, they may or may not have a known zero point. First, the zero 
Point of the scale may not coincide with the zero point of the trait or 
Characteristic which the scale attempts to measure. An example of this 
is provided by the thermometer. On the Fahrenheit scale 0° does not 
Tepresent a complete absence of heat. This type of unit, to some extent, 
imits the statistical treatment and interpretation which can be made. 
For example, if the temperature is 20° F today and was 10° F yesterday, 
16 cannot be said that it is twice as warm today as it was yesterday. An- 
Other example may indicate further the impossibility of this type of 
interpretation. Suppose it is 6° above zero today and yesterday it was 

below zero (—1°). Is it a minus six times as warm today as it was 
yesterday? Thus, it will be seen that a point on a scale cannot be inter- 
Preted in terms of its ratio to another point on the scale unless zero on 
the scale ‘corresponds to the zero point of the trait or characteristic which 


tee attempts to measure. 
y fortunately, units of this ty 
on and psychology. Measurements 0 


merous in the fields of educa- 


pe are nu A 
ay all be so classi- 


f achievement m 


14 STATISTICAL METHODS 


fied. For example, if a pupil makes a score of zero on & test in geography, 
it cannot be said with certainty that he knows nothing about geography. 
It is quite conceivable that he may know a great deal about geogra- 
phy but that his knowledge is of such a simple character that the scale or 
test does not measure it. That is, the zero point of geography achievement 
may lie at some unknown distance below the zero point of the scale which 
purports to measure achievement in geography. 

Second, numerical units may be classified according to whether the 
unit is constant throughout the scale or whether it varies with reference 
to its location on the scale. For example, the units (the degrees) on the 
thermometer are considered to be equal. In other words, there is just as 
much difference in temperature between 5° and 10° as there is between 
95° and 100°. Unfortunately, it is not always possible to find units of this 
type in education and psychology. Suppose a pretest and a final examina- 
tion are given in algebra. If one pupil makes a gain from a score of 5 
on the pretest to 25 on the final examination, does another pupil make 
the same gain who makes a score of 75 on the pretest and 95 on the final 
examination? Unfortunately, this question cannot be answered. Although 
each pupil has made a gain of 20 on the scale, evidence is not available 
to indicate that the 20 units at one place on the scale are equal to 20 
units at some other place on the scale. Units of this type are sometimes 
called rubber units. It is, of course, possible to convert these rubber units 
into equal units by appropriate treatment whenever the normality of the 
evaluated trait is not open to serious theoretical objections. Such trans- 
formations are seldom undertaken, since empirical evidence indicates that 
statistical treatment is changed but little by assuming that the numerical 
units may be handled as if they were equal units. 


DESCRIPTIVE UNITS 


Many variables are evaluated and reported in descriptive units. Per- 
haps the most frequently encountered is course achievement, which at 
many universities is reported as A, B, C, D, and F. It has been the 
usual practice to arbitrarily assign values of 4, 3, 2, 1, and 0, respectively, 
to these letter marks. Thus the assumption is made that the difference 
in achievement between A and B is equal to that between B and C, 
and so forth. Such an arbitrary decision has also been made in most 
achievement tests as well as in attitude and interest scales. In many 
attitude scales the possible item responses are strongly favorable, favor- 
able, uncertain, unfavorable, and strongly unfavorable. To these responses 
numerical values of 5, 4, 3, 2, and 1 are usually assigned. In a similar 
manner, values of 1 and 0 are usually assigned to correct and incorrect 
responses for each item in a multiple-choice achievement test regard- 
less of the difficulty of the item. 

Tt is possible to assign values to descriptive units other than equal 


mm AA 


RESEARCH INFORMATION 15 


unit differences. Perhaps the most frequently used procedure for establish- 
ing numerical values for available descriptive units is to assume a normal 
distribution of the characteristic to be evaluated. The procedure for 
obtaining the numerical values after making such an assumption is dis- 
cussed in detail in a later chapter. For the present purpose, the principle 
to be emphasized is that the distribution in the categories of a descrip- 
tively expressed variable does not provide conclusive evidence of nor- 
mality in the characteristic evaluated. For example, the rating of two 
products turned out by 50 boys in the same class in woodworking, one 
of which was easy and the other difficult to prepare, might very well 
be as follows: 


PRODUCT EXCELLENT GooD FAIR POOR 
A 40 6 3 1 
B 4 15 16 15 


No evidence is available from the foregoing distributions to support 
or refute the assumption that ability to turn out a satisfactory product 
on a lathe is a characteristic which is normally distributed. 


DISCRETE AND CONTINUOUS UNITS 


Two types of units may be identified in numerically expressed vari- 
ables; one unit being a point or discrete unit and the other an interval 
or continuous unit. The point unit occurs whenever all cases lie exactly 
at a given point and no cases could theoretically lie between two con- 
secutive points. An example of such a discrete unit is the number of chil- 
dren in any one family who are enrolled in the public schools as of any 
given date. From family to family, the reported information would be 
in terms of integers, 0, 1, 2, 3, and so on. The possibility of any family, 
having 2.24 children enrolled in school on any given date is ridiculous. 

The continuous unit, on the other hand, is an approximation of a value 
the limits of which depend upon the sensitivity of the evaluative instru- 
ment. The heights of adult males may serve as an example. Between 
the limits of the shortest man and the tallest man, no single individual 
height is inconceivable. The heights may be reported to the nearest inch, 
to the nearest one-half inch, or to the nearest one-fourth inch. Regardless 
of the reported values, intermediate values are not only expected but will 
predominate. Perhaps a good idea of a reported value is that it locates 
an imaginary line, like the equator, one-half the values lying above and 
one-half below the reported imaginary value. For example, any given 
individual may be reported as 5 feet 10 inches tall, and yet actually his 
height is some fractional amount although perhaps an infinitesimal 
amount away from 5 feet 10 inches, Perhaps no individual is exactly 


16 STATISTICAL METHODS 


5 feet 10 inches but rather 5 feet 9.9999 inches or 5 feet 10.0001 inches 
tall. 

The distinction between a discrete and a continuous unit seems obvious 
from the foregoing examples. However, for many numerically expressed 
variables, the designation of discrete or continuous units is not so clear 
cut. The scores on a multiple-choice test of ability to recall facts and 
principles in chemistry most often are obtained by counting the number 
of items answered correctly. With such a 50-item test administered to 
a large number of students, a frequency distribution could be made 
of the number of students who made scores of 0, 1, 2, 3, and so on to 50. 
A score of 20.47 for any given student is impossible. On first thought, 
the score might be considered to be a discrete unit, since scores are not 
available except for whole numbers. Further consideration suggests, 
however, that the ability to recall facts and principles in chemistry is a 
continuous variable, no more accurately here evaluated than by whole 
numbers from 0 to 50. If the scores are so regarded, their distribution 
would be continuous like a distribution of heights, no more accurately 
reported than to an inch. All research workers in education and psychol- 
ogy should recognize that any given test score represents a range of 
values rather than a point on the characteristic continuum which the test 
has been designed to measure. Thus, test scores are accepted as continu- 
ous units rather than discrete units. 

The response to any one of 50 items on such an achievement test in 
chemistry is usually evaluated as correct or incorrect. First thought sug- 
gests that such a response to any single item by any student be scored 
as unity for correct and zero for incorrect. This response should not be 
considered as a discrete unit of one or zero but rather as a range of 
values in chemisty achievement which cannot be more accurately evalu- 
ated by a single item than as correct or incorrect. It is apparent to all 
concerned with test construction that no item should be included in a test 
which does not help in placing an individual on the characteristic con- 
tinuum which the test is designed to measure. 


INTERVAL LIMITS OF CONTINUOUS UNITS 


It is necessary to consider the method of collecting the data whenever 
the limits of a reported measurement are to be assigned. For some vari- 
ables such as weights of school pupils the reported weight should be 
considered as the mid-point of the interval unit. Thus, a reported weight 
of 97 pounds represents a weight somewhere between 96.5 pounds and 
97.5 pounds, providing the weights are estimated to the nearest pound. 
The reported weight of 97 pounds represents a weight somewhere between 
96.75 pounds and 97.25 pounds if weights are reported to the nearest one- 
half pound. Likewise, the weight of a load of cattle might very well be 


A 


RESEARCH INFORMATION 17 


estimated to the nearest 10 pounds, in which case the interval limits are 
5 pounds above and below the reported weights. 

In certain measurements, however, the reported value represents the 
lower limit of a continuous unit. An excellent example is provided by 
most reported ages of people. It has been customary for individuals to 
report their ages as of last birthday. Thus, a 20-year-old has passed his 
20th but has not reached his 21st birthday. Such reported ages are the 
lower limit values rather than the mid-point values of a continuous unit. 
Actually a good estimate of the average age of all reported 20-year-old 
individuals is approximately 20 and one-half, the mid-point of this unit 
whenever years are reported to the last birthday. If the average age of 
three boys 9, 10, and 11 years of age is desired, probably the best esti- 


TABLE 7. Methods for Designating I ntervals in the Tabular 
Presentation of a Frequency Distribution 


II - ÓAC2€QÓQÓOoÓORÓKÓQKÉK —— 
WEIGHTS OF CHILDREN 


REPORTED VALUES THEORETICAL VALUES 


LOWEST AND LOWEST LOWEST AND MID- 
HIGHEST ONLY HIGHEST POINT 

E we AR, O 
130-139 130 129.5-139.5 134.5 2 
120-129 120 119.5-129.5 124.5 6 
110-119 110 109.5-119.5 114.5 12 
100-109 100 99.5-109.5 104.5 36 
90-99 90 89.5-99.5 94.5 42 
80-89 80 79.5-89.5 84.5 32 
70-79 70 69.5-79.5 74.5 18 
60-69 60 59.5-69.5 64.5 S 


RE 


mate is 10 and one-half years rather than 10 years which would be 
found by averaging the three reported ages. If ages are reported to the 
nearest birthday, such as is required by insurance companies, the re- 
ported value represents the mid-point of a unit, rather than the lower 
limit. 

In a frequency dis 
as shown in Table 7, 


tribution, several alternate methods are available, 
for designating the limits of an interval. The data 
used are weights of a group of school children estimated to the nearest 
pound classified into intervals of 10 pounds each. In making a frequency 
distribution more speed and accuracy are possible by using reported 
values, either the lowest and highest reported values in an interval as in 
column (1), or only the lowest reported value as in column (2). In 
presenting a frequency distribution to a reader either of the two forego- 
ing, or the highest and lowest theoretical values in an interval as in 


18 STATISTICAL METHODS 


column (3) or, perhaps, the mid-point of theoretical values as in column 
(4) may be used. 

From the standpoint of the consumer of research information, the stu- 
dent should be familiar with the meaning of all four of these plans. For 
presenting such distributions, the use of the lowest and highest reported 
value, as shown in column (1), is recommended and is used throughout 
this book. 


COMPOUND UNITS 


The use of compound units is commonplace not only in research but in 
everyday life. All motorists recognize the usefulness of the miles per gal- 
lon unit of gasoline consumption. Such a compound unit is required be- 
cause the amount of use which an automobile receives varies from car 
to car. The unit, miles per gallon, represents an attempt to equate the 
gasoline consumption information from one car to another. Rigid research 
techniques suggest that all cars should be given the same amount of 
gasoline and that the distance traveled on that constant amount of gaso- 
line be used as the unit of measurement. Although it is theoretically more 
desirable to replace a compound unit with a simple unit whenever pos- 
sible, summarization of existing information in which varying amount 
of exposure exists demands compound units. 


Distance 


Number of Gallons 


The characteristic demanded of a satisfactory compound unit is a 
linear relationship between the numerator and denominator which pro- 
duce the compound unit. Thus, for any given car, if the mileage is 
plotted on the Y-axis and the gasoline used on the X-axis, the relation- 
ship would tend to follow a straight line. To the degree that a compound 
unit does not consist of two linearly related characteristics, it is open to 
serious question. 

In many measurements required for research problems in education 
and psychology a compound unit must be employed. The intelligence 
quotient furnishes a good example. It is obtained by dividing an indi- 
vidual’s mental age, as indicated by some test, by his chronological age 
and multiplying by 100. The average individual would have an LQ. of 


RESEARCH INFORMATION 19 


100. The numerator for this compound unit, the mental age, must be 
found from some intelligence test. For school children, each succeeding 
year will indicate some increase in mental age. If mental age and chrono- 
logical age are plotted, the relationship may be shown as follows: 


Mental Age 


0 5 10 15 20 25 30 35 40 45 50 
Chronological Age 


The shape of the curve will depend upon both the particular test used 
and the individual tested. In general it may be said that performance on 
such tests increases with age, almost in a straight line until the age of 
12 or 13 years after which the rate of increase becomes smaller and 
smaller until a maximum occurs probably between 20 and 25 years fol- 
lowed by a gradual decline. An inspection of the graph indicates that 
the quotient of the mental age and the chronological age is not a satis- 
factory unit for use for all ages between 5 and 40 years. Psychologists 
have long recognized this limitation! and have followed the procedure of 
not increasing the denominator (chronological age) beyond 15 years or 
at most 16 years. The assumption then follows that the relationship 
between mental age and chronological age is linear between ages of 5 
and 15 years. From the graph shown, this assumption, although prob- 
ably not entirely correct, yields a reasonably satisfactory compound unit 
for individuals between these ages. y 

Some achievement measurements, such as typing speed, are expressed 
in compound units. The number of words per minute appears satisfac- 
tory as a unit, since the relationship between the number of words typed 
and the typing time is reasonably linear unless the typing time is so long 


that fati n important factor. 
atigue becomes a E ployed in school administration 


Q i frequently em 
Ofapound units are 219 used of such units is per pupil 


studi t frequently 
les. Probably the: mos E te for size of school. School cost tends 


cost, This unit is often used to equa 


ha 

ip $ ion of the LO, the student should consult suc 
reference as David gjete ster, The Measurement of Adult Intelligence (Baltimore, 
he Williams and Wilkins Company, 1944). 


20 STATISTICAL METHODS 


` to increase as the size of school increases, but the relationship is not 
linear, as indicated by the curve shown in the following graph. 

For many purposes, per pupil cost is an unsatisfactory unit. When it 
is used, interpretations must be made with caution. For example, it has 
long been recognized that conducting a satisfactory public school is ac- 
companied by considerable cost. The quality of program will usually 
have a lower per pupil cost in a large school system, however, than in a 
small one. Thus the unit of per pupil cost in a statewide study is not 
necessarily comparable among large and small schools. 


Total School Cost 


Size of School 


Often research studies will employ a compound unit which is used as 
a simple unit. For example, typing achievement in terms of words per 
minute becomes a simple unit when all students are tested for the same 
number of minutes. In a similar way, per pupil cost becomes a simple 
unit whenever all schools have approximately the same number of pupils. 
Whenever compound units are so used, they are satisfactory for that 
particular study, but they may or may not be useful if comparison is 
made with another study in which the exposure, shown in the denomina- 
tor, is different. 

Defining terms, choice of units, and shape of distributions are basic 
considerations in the planning of any research study. If too little atten- 
tion is given to these details, no statistical analysis of the data is likely 
to yield conclusions in which confidence can be placed. 


Exercises 


1. Classify each of the following according to whether enumeration or meas- 
urement data are involved: place of residence; strength of grip; attitude toward 
war; religious affilidtion; socioeconomic status; political party registration; 
reaction time; achievement; votes for each of two candidates for an office. 

2. In each of the following exercises a research problem is described, the solu- 
tion of which necessitates a definition of one or more terms. Formulate a prac- 
tical definition for each of the terms appearing in italics. 


RESEARCH INFORMATION 21 


An investigation was made of the relationship between 
a. Retardation and delinquency. 
b. Scholastic aptitude, hours spent in study, veterans status, and scholastic 
achievement of college freshmen. e 
c. Level of aspiration in a course, class attendance, and intelligence of 
psychology students. 
3. The following data represent the number of male and female psychology 
majors in three schools: 


eS 
t SCHOOL 
SEX A B ej TOTAL 
Male 120 100 180 400 
Female 40 100 60 200 
Total 160 200 240 600 


How should percentages be computed to make the following comparisons 
= Most meaningful to a reader? vie 
a. The percentage of males and females within each school. 
b. The percentage of each sex attending the three schools. 
c. The percentage of the total of students attending each school by sex. 
4. Prepare a frequency distribution from the following intelligence quotients: 


138, 135, 132, 132, 130, 127, 126, 123, 122, 121, 121, 120, 120, 119, 118, 117, 117, 
116, 116, 115, 114, 114, 114, 113, 118, 112, 112, 111, 111, 111, 110, 109, 108, 108, 
107, 107, 106, 106, 106, 105, 105, 105, 104, 104, 104, 103, 103, 103, 103, 103, 103, 
102, 102, 102, 102, 101, 101, 101, 101, 101, 101, 101, 100, 100, 100, 100, 100, 99, 
99, 99, 99, 99, 98, 98, 98, 97, 97, 97, 97, 96, 96, 96, 95, 94, SN 
93, 92, 92, 91, 90, 89, 88, 87, 86, 85, 85, 84, 82, 81, 79, 78, 77, 


5. Superimpose a frequency polygon, 3 histogram, and a smoothed curve on 


Ge a data in Exercise 4. de» 3 - 
b. TA bade = lower theoretical limits of a 2-inch interval for 


heights collected to the nearest half-inch, if the lower Se limit of the in- 
terval is 72 inches? What is the mid-point of this Gre, La E 
7. Which of the following are compound units and which are simple units 
a. Mental age 
b. Gain as defined by t 
C. Per cent of gain lost in reten 


- Reaction time ; 
e. Population density defined as persons per square mile 


he difference between pretest and final test scores 
tion experiments 


2 


Measures of Central 


Tendency 


Since the human mind is unable to comprehend readily the significance 
of unorganized masses of data, it is helpful to characterize or to describe 
the data in terms of one or more summary values which stand in place 
of the numerous individual values represented by the data. Such sum- 
mary values may include the median, the mean, the range, the standard 
deviation, and so forth. The appropriate summary measure is determined 
by the purpose in view in characterizing or describing the data. In many 
cases the purpose demands that more than one measure be selected. It 
is obvious that no single value can ever describe completely any given 
mass of data. 

That a complete description cannot be obtained in terms of one single 
value or statement is an unfortunate limitation, by no means confined 
to numerical data. If a book is to be described, it may be said to con- 
tain 600 pages. This description in itself may be sufficient for the pur- 
pose at hand. Again, it may be said that a book has a red cover, contains 
70 charts, is divided into 10 chapters, and so on. Depending upon the 
purpose in describing the book, one or more of the foregoing character- 
istics may be chosen. It is obvious that no one characteristic can be 
presented which will give a complete description of the book. In a similar 
way, a description of a group of values may be made by pointing out 
one or more characteristics of those values, 

One of the most common ways of describing values in a distribution 
is by reporting some measure of central tendency. In most distributions 
it can be noticed that the values tend to cluster around certain points 
in the distribution. A measure of central tendency may be computed 
which will represent some point, or some value that reflects the clustering 
of the data. Many writers call the various measures of central tendency 
averages. When the word average is used in this sense, it should not be 
confused with the popularly used average, meaning the sum of all the 

22 


MEASURES OF CENTRAL TENDENCY 23 


values divided by the number of cases in the distribution. Furthermore, 
in everyday conversation, the word average is used to express another 
concept. For example, expressions are heard that “the average male 
teacher receives more salary than the average female teacher.” The con- 
cept that the term average means the sum of all the values divided by 
the number of them, is, in this case, entirely absent. It is necessary 
whenever the word average has been employed to consider carefully just 
what meaning the writer had in mind when he used this term. 

Although many measures of central tendency have been recognized and 
used, only three of these measures are of sufficient importance, judged by 
frequency of use, to be considered. These three measures are (1) the 
mean, (2) the median, and (3) the mode. Each of these measures differs 
somewhat in the method of computation, and each represents a some- 
what different concept when interpreted for purposes of description. The 
appropriate measure to be used depends upon the interpretation which is 
to be drawn from the data. Examples are all too prevalent in educational 
and psychological literature where one measure has been computed and 
interpretations made which are appropriate for some other measure than 
the one which has been computed. The appropriate interpretation will 
be discussed under each of the measures of central tendency. 


THE MEAN 


The mean is the sum of all the values in a distribution divided by the 
number of these values. It will be noted that this measure of central 
tendency is the familiar measure often called the average or arithmetic 
e seen that the mean when computed may be a value 


mean. It b 
ei a he distribution. From the definition 


which does not necessarily occur in t 
of the mean, it is evident that the product of the mean and the number 


of cases yields the total of the values in the distribution. Thus, if the 
mean salary for teachers in a city is $3,800 and there are 100 teachers 
in the city, the total payroll will be $380,000. With no other measure 
of central tendency can this procedure be followed. : I 
The mean is a measure of central tendency the computation of which 
is influenced by the actual size of each value in a distribution. Conse- 
quently, whenever the description of the data demands that the measure 
of central tendency be responsive to the size of each value in the dis- 
tribution, the mean is the appropriate measure to use. i i 
Computation of the Mean. The procedure for computing the mean is 
simple. It is well known that adding together all the values in a distribu- 
tion oni dividing by the number of these values will yield the mean. 


The formula used is 


KË Wash Aa note Xn 
a N 


Ed 


24 STATISTICAL METHODS 
Which is usually written 


ga 

ON 
Here X stands for the mean, X a value in the distribution, N the num- 
ber of values, and 2 (capital sigma) the sum. The formula then reads: 
The mean is equal to the sum of the values divided by the number of 
cases in the distribution. The foregoing formula is used when computing 
the mean from ungrouped data, particularly when an automatic calcu- 
lator is available. 

In spite of the simplicity of this well-known method of computing the 
mean, it is essential that some computational scheme be available when 
the values have been grouped into a frequency distribution. Should an 
investigator want to compute the mean of measurements made on a large 
number of cases and has no calculator at his disposal, the data are 
sometimes arranged in a frequency distribution from which the mean is 
calculated. Also, it is occasionally desirable to find the mean of data 
available only as a frequency distribution. 

In Table 8 are shown the intelligence quotients obtained by 167 high 


TABLE 8. Intelligence Quotients of 167 Pupils 


1.Q. INTERVALS FREQUENCY 
145-149 d 
140-144 3 
135-139 5 
130-134 8 
125-129 11 
120-124 17 
115-119 21 
110-114 22 
105-109 24 
100-104 20 

95-99 15 
90-94 12 
85-89 6 
80-84 2 


school pupils. In constructing the frequency distribution shown in this 
table, the number, or frequency, of cases falling between the limits of 
each class interval has been tabulated. To understand the computation 
of statistical measures from a frequency distribution requires knowledge 
of the upper and lower theoretical limits of the interval employed. 
Since intelligence quotients are obtained by rounding off to the nearest 
whole number, the theoretical limit of each interval is 0.5 lower than the 
reported lower limit which appears in the table. Thus the lowest interval 


MEASURES OF CENTRAL TENDENCY 25 


extends from 79.5 up to 84.5 with a mid-point in this interval of 82 
rather than 82.5. 

In situations in which the data have been collected to the last whole 
number, such as is the usual case with objective achievement test scores, 
the theoretical and the reported interval limits are identical. 

The mean is computed from a frequency distribution in three steps, 
viz., (1) assuming some mean, (2) computing the amount of error occa- 
sioned by guessing the mean, and (3) adjusting the assumed mean for 
the correction found necessary from guessing the mean. 

Let 107 be assumed as the mean, lying at the mid-point of the interval 
with a lower reported limit of 105 and a lower theoretical limit of 104.5. 
Then under the column heading d is placed the number of intervals each 
lies away from the interval with a lower reported limit of 105. The devia- 
tions above the assumed mean are given a positive sign and the devia- 
tions below the assumed mean are given a negative sign. So far as the 
24 cases in the interval with a lower reported limit of 105 are concerned 
the guessed mean needs no correction. For the 22 cases in the +1 interval 
a correction would be necessary which is not quite offset by the 20 cases 
lying in the —1 interval. In each interval, the number of cases is multi- 
plied by the deviation and the product put in the column headed fd. The 
sum of the positive fd is 258 and of the negative fd is —120, or the 


1.Q. INTERVALS SA d fd 
145-149 1 8 8 
140-144 3 7 21 
135-139 5 6 30 
130-134 8 5 40 
125-129 11 4 44 
120-124 17 3 51 
115-119 21 2 42 
110-114 22 1 22 
105-109 24 0 00 
100-104 20 -1 —20 

95-99 15 —2 —30 
90-94 12 —3 —36 
85-89 6 —4 —24 
80-84 2 —5 —10 
Total 167 138 


AAA AAA ———— E a a a 


algebraic sum is +138. If the algebraic sum had been zero, the guessed 
mean would be the mean. However, since Efd = +138, the mean lies above 
the guessed mean by 138/167 of an interval; or, since each interval is 
5, 138/167 times 5, approximately 4.13, is the amount which must be 
added to the guessed mean to produce the mean. It follows that the mean 
is 111.13. The formula, then, for computing the mean in a frequency 


distribution is 


26 STATISTICAL METHODS 
X-G.M. + (2); 


where X is the mean; G. M. is the guessed mean; Efd is the algebraic sum 
of the products of the frequency and the deviation; N is the number of 
cases in the distribution; and h is the height of the interval. Substitution 
into the formula yields 


Y-G.M.t 2) KSI (E) (5) = 111.13 
N 167 

Perhaps a better understanding of the formula for computing the mean 
from grouped data can be obtained by considering the meaning of the en- 
tries under the column heading d. An understanding of these entries requires 
a knowledge of two mathematical principles. The first principle may be 
stated as follows: Whenever a constant is subtracted from each value in 
a distribution, the mean of the original values can be found by computing 
the mean of the modified values and adding the constant to this mean. 
For example, the mean of the distribution 6, 8, and 10 is 8. If the con- 
stant 2 is subtracted from each original value, a mean of 6 for the modi- 
fied values results. The constant 2 must be added to this modified mean, 
6, to produce the mean of the original values, 8. The second principle is: 
Whenever each value in a distribution is divided by a constant, the mean 
of the original distribution can be found by multiplying the mean of the 
modified values by the constant. Thus, if each value in the distribution 
6, 8, and 10 is divided by the constant 4, the mean of the modified values 


1.5, 2, and 2.5 is 2. Multiplying this mean 2 by the constant 4 produces ' 


the mean of the original values, 8. If the original values 6, 8, and 10 are 
first modified by subtracting the constant 2, and then by dividing each 
result, 4, 6, and 8, by the constant 4, the mean of the resulting values 1, 
1.5, and 2 is 1.5. The mean of the original values can be obtained by first 
multiplying the mean 1.5 by the constant 4 and then adding this product 
to the constant 2. Thus 2 + (1.5) (4) = 8. The same principles are in- 
volved in the computation of the mean from grouped data, The entries 
in the column d may be produced as follows: the guessed mean, a con- 
stant, is subtracted from the mid-point of each interval. Each of these 
values is then divided by the height of the interval, also a constant. The 
resulting values are the entries in the column d. The necessity for multi- 


plying E by h, the second constant, and then adding this product to 


the guessed mean, the first constant, to obtain the mean of the original 
values should be apparent from the foregoing discussion. 

Whenever a mean is computed from a frequency distribution, the as- 
sumption is made that the mean of the values in any interval lies at the 


"TTT = 8 
ee Se eee 


MEASURES OF CENTRAL TENDENCY 27 


mid-point of that interval. Thus it can be seen that one way of comput- 
ing the mean from grouped data would be to multiply the mid-point of 
each interval by the frequency in that interval and divide the sum of 
these products by the number of cases involved. Such a procedure usually 
involves larger values than those involved in the assumed mean procedure, 
but the two computational procedures are mathematically identical. Re- 
gardless of the computational procedure used, when the number of cases 
in a distribution is small, the assumption that the interval mid-point is 
the mean of values in that interval may produce a different mean from 
that which is obtained when the original values, from which a frequency 
distribution has been derived, are added together and divided by the 
number of cases. 

In the computation of the mean for the data in Table 8, the guessed 
mean was assumed to be 107, the mid-point of the interval with a lower 
reported limit of 105 and a lower theoretical limit of 104.5. It makes no 
difference what value is assumed to be the mean, except that it must be 
the mid-point of one of the intervals in the distribution. Suppose 122, 
the mid-point of the interval with a lower reported limit of 120, is chosen 
as the guessed mean. The computation is as follows: 


10. INTERVALS d d fd 
145-149 1 5 5 
140-144 3 4 12 
135-139 5 3 15 
130-134 8 2 16 
125-129 11 1 11 
120-124 17 0 00 
115-119 21 =l —21 
110-114 22 —2 —44 
105-109 24 —3 —72 
100-104 20 —4 —80 

95-99 15 —5 —75 
90-94 12 —6 —72 
85-89 6 SH —42 
80-84 2 —8 —16 
Total 167 —363 


Substituting in the formula: 


GN: 69) — 122 ($ SN 6) = 1118 
D the negative signs are troublesome, they may be avoided by choos- 
ing as the guessed mean the mid-point of the lowest interval. In the 
present distribution, choosing 82 as the guessed mean vvill eliminate all 
negative numbers from the computation as follows: 


28 STATISTICAL METHODS 


1.Q. INTERVALS f d fd 

5-149 1 13 13 
140-144 3 12 36 
135-139 5 11 55 
130-134 8 10 80 
125-129 shi 9 99 
120-124 17 8 136 
115-119 21 T 147 
110-114 22 6 132 
105-109 24 5 120 
100-104 20 4 80 
95-99 15 3 45 
90-94 12 2 24 
85-89 6 1 6 
80-84 2 0 0 
Total 167 973 


Substituting in the formula: 


zj), — = E 
z- om DÉI = 02 + QË (5) = 111.18 


THE MEDIAN 


The median is that point in the distribution above which and below 
which 50 per cent of the cases lie. It should be noted that the median as 
here defined may differ somewhat from the mid-score. The mid-score is 
the actual score occurring in a distribution above which and below which 
the number of scores is equal. The term mid-score was used quite gen- 
erally in the past when it was assumed that a distribution of scores was 
composed of discrete units. Since the median, when computed, is not 
necessarily a value which occurs in the distribution, it is somewhat diffi- 
cult to interpret the median in a distribution made up of discrete values. 
For example, suppose a distribution is composed of the number of chil- 
dren of school age per family in a city. It is quite conceivable that the 
median, when computed, might be 2.52 children. Many writers contend 
that a median in this case has no significance since fractional parts of a 
child are inconceivable. It is certainly true that interpretations made in 
a continuous series by describing a place in the distribution below which 
half of the cases lie carry more meaning than do similar interpretations 
made in a discrete series. 

The median is a position average, an average depending on the relative 
magnitude of a set of scores rather than on their absolute magnitude. 
Since it is a position average, it is necessary that the data be arranged 
in the order of their size before it can be computed. Obviously, it is the 
appropriate measure to use whenever a description is desired which is 
insensitive to extreme variations in the data. 


MEASURES OF CENTRAL TENDENCY 29 


Computation of the Median. When values are ungrouped, the mid-score 
or mid-value of a numerically arrayed distribution is usually defined as 
the median. To obtain the median, the data are arranged in order of their 
size, 1.e., into an array, and the median is found by counting. If an odd 
number of cases is involved, the score of the middle case is the median. 
If an even number of cases is present, the general policy is to consider 
the mean of the scores of the two middle cases as the median. 

“In many cases the value of the median is desired when the data have 
been grouped into a frequency distribution. The computation then con- 
sists of identifying that point among the grouped values above which and 
below which are found one-half of the cases. 

A frequency distribution of the heights of entering male college fresh- 
men who were examined during one day is shown in Table 9. Heights 


TABLE 9. Height of Male Freshmen 


HEIGHT INTERVALS NUMBER 
(INCHES) OF MEN 
78.0-79.5 1 
76.0-77.5 3 
74.0-75.5 8 
72.0-73.5 22 
70.0-71.5 63 
68.0-69.5 42 
66.0-67.5 16 
64.0-65.5 4 
62.0-63.5 2 

Total 161 


for these men were reported to the nearest half inch. Thus, a height re- 
ported as 62.50 inches may be anywhere from 62.25 to 62.75 inches, and 
the lower theoretical limit of the two-inch class interval in which that 
height is tabulated is 61.75. To compute the median height, then, requires 
the identification of that height below which and above which are found 
the heights of 804 men. 

The procedure may be clearer if the distribution is transferred to the 
following form where a column has been added for the cumulative fre- 
quency. From the cumulative frequency distribution, it can be seen that 
the median lies somewhere in the interval with a reported lower limit 
of 70 and a theoretical lower limit of 69.75 since 64 cases lie below 69.75 
and 127 cases below 71.75. 

The location of the median will be clearer if the interval with a theo- 
retical lower limit of 69.75 is magnified. The point in the interval is to 
be identified so that 16.5 cases will fall below it and 46.5 cases above it. 
If these 63 cases are equally distributed throughout the interval, an 


30 STATISTICAL METHODS 


HEIGHT INTERVALS 


(INCHES) f CUMULATIVE f 
78.0-79.5 1 161 
76.0-77.5 3 160 
74.0-75.5 8 157 
72.0-73.5 22 149 
70.0-71.5 63 127 
68.0-69.5 42 64 
66.0-67.5 16 22 
64.0-65.5 4 6 
62.0-63.5 2 2 

Total 161 


assumption which is always made and an assumption which is consistent 
with that necessary for the computation of the mean from grouped data, 
then the median lies 16.5/63 of the distance, 2 inches, above the bottom 
of the interval. Thus, the median is located at a point 16.5/63 of 2 which 
is 0.52 above 69.75. The median height of these men, then, is 70.27 inches. 


71.75 
A 
g (46.5 cases above median) 
3 
a 
N 
Median 
(16.5 cases below median) 
69.75 


The foregoing calculations can be reduced to the following formula. 
i 3) 
Median = 1+ \———]h 
Ja 
1 = lower theoretical limit of the interval in which the median lies 
fo = cumulative frequency up to the interval containing the median 
fo = frequency within the interval containing the median 
N = total number of cases 
h = height of the interval 


MEASURES OF CENTRAL TENDENCY 31 


The substitution of the appropriate values in the equation yields a 
median height of 70.27 inches. 


N 161 
o de E 
Median = l + F h = 69.75 + = (2) = 70.27 


w 


THE MODE 


The mode is the value which occurs most frequently in a distribution. 
Many writers have designated this measure as the crude mode, to dis- 
tinguish it from the theoretical mode which is mathematical in its origin 
and finds but little use in educational and psychological research. In a 
distribution where the values have been grouped, the mid-point of the 
class interval containing the greatest frequency is generally regarded as 
the mode. It is possible for a given distribution to have more than one 
mode. 


CENTRAL TENDENCY IN NONROUNDED 
MEASUREMENT DISTRIBUTIONS 


The computational procedures of the mean, median, and mode in a 
frequency distribution which have been described in the foregoing sec- 
tions are based upon the assumption that any given measurement is an 
approximation representing a range of values both above and below the 
reported measurement. This assumption is the usual one encountered by 
the research worker in education and psychology. On the other hand, 
there are situations in which the individual measurement is not rounded 
off to the nearest reported value. For example, individuals generally re- 
port age as of the last birthday, and a score on an objective test generally 
refers to the lower level of achievement in the characteristic evaluated. 
When such measurements have been assembled in a frequency distribu- 
tion, the mid-point of an interval must be adjusted accordingly. If ages 
are reported to the last birthday, the mid-point of an interval of reported 
ages of 20 years to 29 years is 25 years, rather than the 2416 years if 
ages reported to the nearest birthday are used. 

Scores on an objective test, likewise, may be considered as approximate 
measurements reported to the last whole number. If so considered, the 
mid-point of an interval of scores reported as 20 to 29 would be 25 rather 
than 24.5 as shown in the foregoing computational procedures. It should 
be noted, however, that reported test scores are relative measurements. 
Whether all such reported scores are one-half a unit too low is of little 
practical or theoretical importance. 

The average course mark, defined as the quotient of obtained honor 
points and credit hours completed, may be obtained by rounding off 
decimal places, as is usually done, or by dropping all decimals. For ex- 


32 STATISTICAL METHODS i 


ample, an investigator may report an average course mark of 2.6 or 2.7 
for a college student who has compiled the following academic record for 


a single term: 


HONOR 
HOURS MARK POINTS POINTS 
3 A (4) 12 
3 B (3) 9 
5 C (2) 10 
1 D (1) 1 
12 32 


When such average course marks are grouped into a frequency distribu- 
tion, the mid-point of an interval is dependent upon the investigator’s 
procedure of rounding off or dropping decimals. Thus the interval mid- 
point can be designated only when the procedure of obtaining the re- 
ported measurement is known. 

In most problems occurring in educational and psychological research, 
the difference between means of two different groups or the difference 
between means of two sets of measurements of the same group constitute 
the information to be evaluated. Whether measurements have been as- 
sembled by rounding off or by dropping decimals in this type of problem 
is unimportant since the difference in means is identical with both pro- 
cedures. It should be emphasized, however, that comparisons between 
two means should be undertaken only when the rounding off or dropping 
of decimals has been made in the same manner for the groups being 


compared. 


PRINCIPLES OF INTERPRETATION 


In contrast to the median and the mode, the mean is a measure of 
central tendency the computation of which is influenced by the size of 
each individual value. This principle may be made apparent from an 
example of test scores. Let it be supposed that the scores made on a test 
of geography achievement by a group of 100 pupils range from 5 up to 
80. Inspection of this distribution indicates that one half of the scores lie 
above 53 and one half below 53. Then, by definition, 53 is a median 
score of this group on this test. Also, let it be supposed that 75 is the 
score which appears most frequently in the distribution. Then, by defini- 
tion, a score of 75 is the mode in the distribution. Likewise, when the 
sum of the scores, 562, is divided by the number of scores (100), a value of 
56.2 is obtained. Then, by definition, 56.2 is the mean score in the distri- 


bution. 
Jt should be evident that the median score of 53 will be unchanged if 


e 
MEASURES OF CENTRAL TENDENCY 33 


the highest score is changed to 180, or even if reduced to 54. That is, 
except for arranging the data in sequence, the median does not depend 
upon the size of any individual value. It remains unchanged even when 
the low scores are made lower or when the high scores are made higher 
provided they do not cross over the median point. It is essentially a rank- 
ing measure; i.e., if the scores are arranged in the order of their size, the 
median is the number associated with the mid-rank. 

The determination of the mode is unaffected by the size of the values 
in the distribution. Only the similarity of objects in a series influences 
the mode. After the judgment of similarity has been made, any of the 
scores may be changed to other values, providing these changes do not 
produce a greater number of similar scores than the original modal scores. 

Since the size of each score is considered in the addition necessary for 
computing the mean, it is obvious that the mean will be increased if the 


Frequency 


Size of Numerical Variable 
Fra. 9. Location of the Mean, Median, and Mode in a Normal Distribution. 


highest score is changed from 80 to 180. In other words, the mean is sen- 
sitive to the size of each score which appears in the distribution. This 
situation is reflected in the algebraic sum of the difference between the 
mean and each value in a distribution. When the mean is subtracted from 
each value individually and the algebraic sum of these differences is 
obtained, this sum is invariably zero. : 

In any symmetrical distribution such as the normal distribution, the 
mean, the median, and the mode coincide. From a graph of a normal 
curve it is evident that these three measures of central tendency coincide 
as at point A in Figure 9. Many sets of research data in education and 
psychology follow this type of distribution and, on the other hand, many 
do not. TË is evident that the interpretations which are appropriate for 
the mean, the median, and the mode are interchangeable in a normal 
distribution. It is legitimate to compute one of these measures and inter- 
pret in the light of another only when it has been shown previously that 
the distribution under consideration follows the normal curve. The as- 


34 STATISTICAL METHODS 


sumption of a normal curve, however, without any supporting evidence, 
often leads to unjustified conclusions. 

An illustration of the position of the mean, median and mode in one 
type of nonnormal distribution is given in Figure 10. Point A is the mode, 
point B is the median, and point C is the mean. 

The mean, median, and mode may be located anywhere between the 
lowest and the highest value, depending upon the shape of the distribu- 
tion. If a test is given in general science to 100 boys and the scores vary 
all the way from 20 to 60, it is evident that the mean, median, and mode 
must lie somewhere between 20 and 60. However, it is not necessary that 
they lie halfway between 20 and 60. 

The percentage of cases above or below the mean varies from 0 to 100 
(not inclusive) depending upon the shape of the distribution. Although it 
is true in many distributions that approximately 50 per cent of the cases 


Frequency 


A B c 
Size of Numerical Variable 


Fia. 10. Location of the Mode, Median, and Mean in a Positively Skewed 
Distribution. 


lie above and below the mean, there are many other distributions in 
which this is not true. As an example consider a distribution which has 
been made of salaries of 100 graduates of a certain college in the class of 
1935. If 99 are making salaries less than $10,000 annually and the income 
of the other one is $5,000,000, the mean income for the entire 100 will 
be greater than $50,000, although 99 per cent will have incomes smaller 
than the mean income. The mean, in itself, gives no evidence concerning 
the percentage of cases which lie above or below it. 

In contrast to the mean, the median multiplied by the number of cases 
may not give the total of the values in the distribution. In the example 
just preceding, the mean income is approximately $57,000. If this mean 
is multiplied by the number of cases (100), the product will represent 
the total income of the entire group. In other words, whenever the mean 
is multiplied by the number of cases, the product yields the total of all 
the values in the distribution. This condition is true for the median only 


MEASURES OF CENTRAL TENDENCY 35 


when it is the same as the mean, and it is not true in any other case. In 
the foregoing example, the median was actually found to be approximately 
$7,000. If this median value is multiplied by 100, $700,000 is obtained, 
which obviously is much below the total income for the entire group. 

If a distribution is composed of groups and the mean of each group is 
known, the mean of the total distribution may be found as follows: mul- 
tiply the mean of each group by the number of cases in that group, sum 
these products, and divide by the number of cases in the distribution. 

The known enrollment data for three counties are as follows: 


NUMBER OF MEAN 
COUNTY HIGH SCHOOLS ENROLLMENT 
A 10 240 
B 8 300 
Tej 12 75 


The mean enrollment for all three counties may not be properly com- 
puted by adding together 240, 300, and 75, and then dividing by 3. Rather 
these means must be weighted by the number of high schools in each 
county. 

The total enrollment in county A is 2,400, in county B 2,400, and in 
county C 900. The combined enrollment in the three counties is, there- 
fore, 5,700. Since there are 30 high schools, the mean enrollment is 190 
pupils. Whenever a mean has been computed from known means, care 
should be taken to see that the known means have been properly 
weighted. 

In contrast to the mean, if a distribution is composed of groups and 
the median of each group is known, the median of the entire distribution 
cannot be computed from these known medians. 

The following are the scores made on an English test by three sections: 


Section A: 8, 10, 11, 12, 12, 12, 12, 13, 24, 28, 29, 30, 32, 33, 34. Median = 13 
Section B: 9, 10, 10, 11, 11, 11, 12, 14, 15, 16, 25, 26, 27, 28, 29. Median = 14 
Section C: 11, 12, 13, 14, 16, 24, 25, 26, 27, 28, 29. . Median = 24 


It is obvious that there is no way of weighting these medians that will 
produce the median score for all pupils, which in this case is 15. When- 
ever a median has been computed, care should be taken to see that origi- 
nal data have been used in its computation. 

In most types of distributions, the mean is the most stable measure of 
central tendency that may be estimated from a sample. Often, in order 
to save time and expense, it is desirable to estimate the mean or the 
median of a distribution by an analysis of a sample of that distribution. 
For example, the mean and the median salaries of teachers in a state 
may be estimated by taking a sample of one-tenth or of some other frac- 


36 STATISTICAL METHODS 


tional part, of the total number of teachers. In this sample, chosen at 
random, the mean and median salaries may be computed. If a number 
of other similar samples are chosen from the same population, there vvill 
be some fluctuations in the mean and median salaries from sample to 
sample. However, the fluctuation will be slightly greater for the various 
medians than it is for the various means.' In most types of distributions 
the fluctuation will be less for the means than it is for any other measure 
of central tendency which is computed in these same samples. 

Any measure of central tendency, mean, median, or mode, in one dis- 
tribution may be identical with the same measure in another distribution, 
yet the distributions vary markedly from each other. For example, two 
classes might be compared with respect to intelligence, or marks on an 
examination, or some other characteristic. If the mean scores are the 
same for the two classes, it might be erroneously concluded that there are 
no differences between the classes. The only conclusion that can be 
reached from this comparison is that there is no difference between the 
mean scores of the classes. That differences may appear in scores even 
though the means are identical is shown by the following scores in his- 
tory obtained by boys and girls. 

Boys: 98, 86, 90, 24, 80, 86, 90, 88, 86, 87, 86, 87, 87, 87 
Girls: 90, 78, 84, 86, 81, 82, 84, 82, 88, 79, 84, 81, 87, 82, 80, 80 

To conclude that there was no difference between scores of boys and 
girls in this class is obviously unwarranted even though the mean score 
for both boys and girls is 83. The same comment is applicable, likewise, 
to either the median or the mode. 

Failure to find differences between means (or medians) of two groups 
may be due to lack of precision in measurement. If an experiment is set 
up for evaluating a substitute food for cow’s milk in feeding babies, no 
differences should be expected in weight if the babies were weighed on 
cattle scales. There might be worth-while merit to this substitute, if 
weights were obtained on a less crude instrument, but such a conclusion 
would probably not be apparent in this experiment. 

Failure to recognize this principle is frequently evidenced in researches 
involving educational or psychological measurements. Often, two teach- 
ing techniques are compared at the end of a short period of experimenta- 
tion. The gains made in one group are compared with the gains made in 
the other, by means of a standardized test. If each unit on the test rep- 
resents, for instance, one month’s achievement, it will require wide dif- 
ferences in the two groups before the test will reveal any differences. 

Differences between similar measures of central tendency in different 
distributions are in themselves no evidence as to the desirability of those 


1This statement may be verified by repeated trials or by mathematical means 
lying beyond the scope of this book. 


MEASURES OF CENTRAL TENDENCY 37 


differences. Suppose it is found in a certain high school that those who 
attend motion pictures more than once a week receive lower grades than 
those whose attendence is less. Finding this difference does not prove that 
attendance at motion pictures causes lower grades. Perhaps lower grades 
caused greater motion-picture attendance. Or perhaps both have been 
caused by some other factor or combination of factors. Whether it is 
more desirable to receive high grades and attend no motion pictures or 
attend motion pictures more frequently and suffer lower marks cannot 
be shown by a statistical difference. 

Each of these principles of interpretation follows directly from the 
definitions which have been assigned to the mean, the median, and the 
mode. Violations of any of these principles, when interpreting measures 
of central tendency, produce unsound inferences and, many times, erro- 
neous conclusions. 


Exercises 


1. Compute the mean and median of the intelligence quotients of high school 
freshmen, shown in the following table. 


INTERVAL INTERVAL 
(1.0.) FREQUENCY Dol FREQUENCY 
140-144 1 100-104 69 
135-139 1 95-99 52 
130-134 3 90-94 35 
125-129 7 85-89 12 
120-124 15 80-84 6 
115-119 38 75-79 2 
110-114 53 70-74 at 
105-109 64 65-69 0 


2. Compute the mean and median score made by fourth-grade pupils in a 
reading comprehension test, shown in the following table. Tests were scored 
according to the number of items correct. 


d 


INTERVAL FREQUENCY INTERVAL FREQUENCY 
A See ee eee 
60-62 1 33-35 4 
57-59 0 30-32 3 
BLES 0 27-29 2 
51-53 3 24-26 0 
48-50 5 ee 1 
1547 9 18-20 0 
Ee 1 15-17 2 
SST 10 12-14 1 
Bae 6 9-11 1 


3. Show algebraically that Ze = 0, i.e., that the sum of the deviations ofa 
series of scores about the mean of the series 1s zero. 


38 STATISTICAL METHODS 


4. The table below gives the amount of time each of 31 students in a sixth 
grade spent in one week in actual recitation, together with their intelligence 
quotients and their reading test scores. 


RECITATION TIME 


PUPIL (MINUTES) LQ. READING TEST 

1 46 107 41 ; 
2 68 116 76 j 
3 45 110 40 
4 21 103 24 e 
5 18 92 22 I 
6 74 138 82 
7 40 82 26 j 
8 48 87 31 { 
9 55 94 36 i 

10 27 78 28 

11 71 84 21 

12 31 110 35 

13 36 114 38 

14 47 120 40 

15 54 111 36 

16 38 103 28 

17 43 105 29 f 

18 61 94 22 

19 44 92 22 i 

20 10 128 76 i 

21 18 87 21 

22 29 83 18 

23 35 108 50 

24 62 109 51 

25 54 110 52 l 

26 51 121 74 

27 46 97 27 

28 42 94 30 

29 38 bk 64 

30 32 90 29 

31 50 107 72 


a. Compute the mean and median time spent in recitation per pupil. The | 
time is expressed to the nearest minute. 

b. Compute the mean and median intelligence quotient. The 1.Q.'s are ex- 
pressed to the nearest point, 1.e., 101, 102, and so forth. 

c. Compute the mean and median reading test score. The test scores are 
expressed in units in which 21 means from 21.0 to 22.00, and so forth 


3 


Quartiles, Deciles, and 


Percentiles 


The median has previously been defined as that point in a distribution 
below which 50 per cent of the cases lie. This measure is a very useful 
one for describing a distribution. However, it is not a complete descrip- 
tion. There are occasions when a description is desired which will char- 
acterize a distribution at some other point or points. For example, an 
investigator may wish to know the value occurring at that point in a 
distribution below which lie 20 per cent of the cases. A number of meas- 
ures have been developed to describe a distribution at practically any 
place along the scale of values that is appropriate for the purpose in 
view. The most important of these measures are the quartiles, the deciles, 
and the percentiles. 


QUARTILES 


The first quartile, ordinarily designated as Qi, is that point in a dis- 
tribution below which 25 per cent of the cases lie. The third quartile, 
ordinarily designated as Qa, is that point in a distribution below which 
75 per cent of the cases lie. The second quartile, by definition, would be 
identical with the median. For this reason, the second quartile is not 
used. It is evident that a distribution is divided into four parts by the 
three quartile points. Since the quartiles are points, it follows that it may 
not properly be said that any score, or value, lies ix a quartile. A score 
may lie at the first quartile, or between quartiles, or just above the third 
quartile, and so on. ie 

Frequently, in educational and psychological literature, a statement is 
made that a specified score occurs in the lowest quartile. When this oc- 
curs, the writer’s meaning is generally clear. He means, the score lies in 
the lowest quarter of scores. There can be no possible advantage in refer- 
ring to the lowest fourth of the scores as a quartile, since the lowest 
quarter conveys the meaning intended without coining a new word such 

39 


A0 STATISTICAL METHODS 


as quartile. The use of the word quartile in referring to a quarter of the 
distribution, therefore, not only is unnecessary but violates the commonly 
accepted meaning that a quartile is a point rather than a range of values. 

A quartile is a relative measure and refers only to the distribution in 
which it has been computed. Thus, a score at the third quartile in a gen- 
eral science test given in one high school may be 80. This statement 
would be interpreted that in this particular high school 75 per cent of 
the pupils taking the general science test made scores below 80. It is 
quite possible that in another high school the score for the same test at 
the third quartile might be 88. It is not necessary for the score at the 
third quartile in one distribution to be identical with the score at the 
third quartile in another school. The only time that one can be sure that 
the third quartile represents the same score in two different distributions 
is when the two distributions are known to be identical. 

The mean of a distribution may lie anywhere between the first and 
third quartiles, above the third quartile, or below the first quartile, de- 
pending upon the shape of the distribution. It will be recalled that the 
percentage of cases above or below the mean varies from 0 to 100 per 
cent, not inclusive, depending upon the shape of the distribution. This 
means that the lowest value and the highest value in a distribution de- 
termine the limits for the location of the mean and that the quartiles 
in no way limit the fluctuations which occur in various types of distribu- 
tions. It is true that in most distributions the mean will lie somewhere 
between the first and third quartile. In a normal distribution, the mean 
will be located exactly halfway between the first and third quartiles. 

The median lies somewhere between the first and third quartiles, its 
exact location depending upon the shape of the distribution. It is obvious 
that the median, below which 50 per cent of the cases lie, must be located 
somewhere between that point in a distribution below which 25 per cent 
of the cases lie, and that point in a distribution below which 75 per cent 
of the cases lie. However, it is not necessary that the median be located 
halfway between the first and third quartiles. A distribution is conceiv- 
able in which many scores are bunched immediately above the first 
quartile, and, in consequence, the median may lie very close to the first 
quartile. The scores above the median up to the third quartile may be 
quite widely scattered, resulting in a wide difference between the score 
at the median and the score at the third quartile. 


DECILES 


The first decile is that point in a distribution below which 10 per cent 
of the cases lie; the second decile is that point below which 20 per cent 
of the cases lie; and so forth; and the ninth decile 1s that point below 
which 90 per cent of the cases lie. 


QUARTILES, DECILES, AND PERCENTILES 41 


Frequency 


1 Cl, E DN Er Ce 9 
Fig. 11. Deciles in a Normal Distribution. 


Like the quartiles, the deciles are points rather than ranges of values in 
a distribution. There are nine decile points which divide a distribution 
into tenths. It follows, therefore, that it may not properly be said that a 
score lies in a decile. It is quite appropriate, however, to say that a score 
lies in the lowest tenth of the distribution. 

- Ordinarily, when a distribution is plotted, the values are shown along 
the base line—the X-axis. In Figure 11 a normal distribution is plotted 
and the deciles are indicated along this base line. If ordinates are erected 
at cach of these deciles, the area, or the number of cases in the distribu- 
tion, is divided into tenths. It is evident that the distances along the base 
line between two adjacent deciles are not necessarily equal. This is an- 
other way of saying that the range of values in the various tenths of a 
distribution may not be equal. 

It is perfectly possible to plot a distribution which is made up of 
deciles rather than the original values. When this is done one-tenth of 
the total number of cases appears in each tenth of the distribution. In 
Figure 12 such a distribution has been plotted. Distances along the base 
line in this case represent one-tenth of the cases. This type of a distribu- 
tion is known as a rectangular distribution. The computation of the 
decile points automatically produces a rectangular distribution from any 


distribution, normal or otherwise. 


Frequency 


déi 


1 
| 
LI 
I 
1 
U 
1 8 


2 3 4 5 
Fra. 12. Distribution of Deciles from a Normal Distribution. 


42 STATISTICAL METHODS 


n PERCENTILES 


The first percentile is that point in a distribution below which 1 per 
cent of the cases lie; the second percentile, that point below which 2 
per cent of the cases lie; the third percentile, that point below which 3 
per cent of the cases lie; and so forth; and the ninety-ninth percentile, 
that point below which 99 per cent of the cases lie. Like the quartiles and 
the deciles, percentiles are points rather than ranges of values in a dis- 
tribution. These 99 percentiles divide the distribution into 100 equal 
parts. Any value appearing at the first percentile or below is given a per- 
centile rank of 1; between the first and second percentiles, a percentile 
rank of 2; and so on; and any value appearing above the ninety-ninth 
percentile is given a percentile rank of 100. In this manner, a distribution 
may be divided into percentile ranks from 1 up to 100. 

Everyone is familiar with the usual method of giving numerical ranks 
to values. The scores from any given test may be placed in rank accord- 
ing to their size. In ordinary practice, the highest score is given a nu- 


TABLE 10. Numerical and Percentile Ranks 
of 100 Raw Scores 


RAW SCORE NUMERICAL RANK PERCENTILE RANK 
138 1 100 
135 2 99 
133 3 98 
“13 “98 E 
10 99 2 
9 100 1 


merical rank of 1; the second highest, 2, and so on. Percentile ranks are 
similar to this type of ranking except in Leo respects. First, the lowest 
numerical rank appears at the top of the distribution, whereas the lowest 
percentile rank appears at the bottom of the distribution. Second, per- 
centile ranks by the very nature of their computation are ranks based on 
100 cases or 100 scores, whereas numerical ranks consist of the actual 
number of scores in the distribution. 

Stated otherwise, numerical rank becomes percentile rank whenever 
there are exactly 100 cases in the distribution and whenever the rank 
order from 1 to 100 is reversed. Thus, if a distribution is made up of 100 
cases, the corresponding numerical ranks and percentile ranks are shown 
in Table 10. It should be noted that, in actual situations, percentiles 
should not be found for distributions containing less than 200 or 300 
cases. 

Like the quartiles and deciles, the difference between two percentiles 


QUARTILES, DECILES, AND PERCENTILES 43 


in any distribution represents number of cases rather than difference in 
values or scores. Consequently, it is not proper to interpret differences 
between percentiles in terms of values or scores. For example, suppose 
that the salary at the twenty-third percentile is $2920 in a distribution 
made up of elementary school teachers’ salaries, and the salary at the 
twenty-fourth percentile is $2940. Even though a range of $20 in salary 
is found between these percentiles, little or no evidence is established 
about the range which will be found between the fifty-eighth and the 
fifty-ninth percentile or the ninety-first and the ninety-second percentile. 
Of course, 1 per cent of the cases lie between consecutive percentiles; but 
there is no way of knowing, from the percentiles themselves, the range 
of salaries between consecutive percentiles. 

Since percentile ranks are relative measures, it follows that two per- 
centile ranks may not properly be compared unless they have been com- 
puted from the same distribution or from similar distributions. For 


TABLE 11. Subtest and Total Test Raw Scores 
and Percentile Ranks for Two Students 


STUDENT A STUDENT B 
RAW PERCENTILE RAW PERCENTILE 
SUBTEST SCORE RANE SCORE RANK 
I 15 30 20 95 
II 15 20 21 90 
I 9 10 16 70 
IV 18 10 23 70 
NN 
Total 57 11 80 87 


example, if the same intelligence test is given in two different institutions, 
percentile ranks may be computed in each institution. A student who re- 
ceives a percentile rank of 63 in one institution may or may not have 
made a score similar to that made by a student in a second institution 
who also receives a percentile rank of 63. If the two distributions of 
scores are identical, scores may be compared directly by comparing per- 
centile ranks. If no confidence can be put in the assumption that the 
distributions are identical, a comparison of percentile ranks has little or 
no meaning. + ae ; 

The mean of percentile ranks from several distributions may not be 
the same as the percentile rank of combined values from these distribu- 
tions. For example, the raw scores and percentile ranks obtained by two 
students on four subtests and the total test are shown in Table 11. The 
mean of the four subtest percentile ranks is 17.5 for Student A and 81.25 
for Student B. Because of the individual variability of these students 
among the four subtests, however, their total test scores place them at 


44 STATISTICAL METHODS 


different points in the total score distribution than would be expected 
from the mean of their four subtest percentile ranks. 

Although it is mathematically indefensible to add together percentile 
ranks to find a mean percentile rank, the Computation of this value, no 
matter what it may be called, is useful and feasible whenever a rough 
value or approximation is needed. If the mean of the percentile ranks for 
a particular individual happens to be 90, however, it is poor interpreta- 
tion to state that 90 per cent of the cases in a group will be lower than 
this individual. 

Computation of Quartiles, Deciles, and Percentiles. The method of 
computing quartiles, deciles, and percentiles from frequency distributions 
is similar to the method employed in computing the median. With the 
median, computation consists of identifying that point in the distribution 


TABLE 12. Ages of Male College Students 


AGE LAST FREQUENCY CUMULATIVE 
BIRTHDAY 0) 
30 and Over 23 754 
29 3 731 
28 13 728 
27 12 715 
26 16 703 
25 34 687 
24 27 653 
23 72 626 
22 92 554 
21 90 462 
20 104 372 
19 108 268 
18 105 160 
17 51 55 
16 4 4 


above which and below which are found one-half of the cases. With the 
quartile, likewise, points in the distribution are to be identified, the only 
difference being with respect to the proportion, or percentage, of cases 
lying above or below the point to be identified. With the first quartile, 
the point is to be located below which are found one-fourth, or 25 per 
cent, of the cases; with the third quartile, below which are found three- 
fourths, or 75 per cent, of the cases; with the first decile, below which 
are found one-tenth, or 10 per cent, of the cases; with the fifty-fourth 
percentile, below which are found 54 per cent of the cases. 

For the purpose of computing these statistical measures a distribution 
is shown in Table 12 of the ages of male college students. The last col- 
umn in the table is the cumulative frequency which somewhat lessens 
the labor of the computation. Examination of the table reveals that the 


hor y 


ë 


QUARTILES, DECILES, AND PERCENTILES 45 


distribution of ages is definitely skewed and that, for the purpose of these 
computations, all students of 30 years of age or more were: classified in 
one interval. Although such an interval may be acceptable in this in- 
stance, computation of the mëan from a frequency distribution contain- 
ing such an interval is of limited value. 

To compute the first quartile, that age in the distribution must be 
identified below which one-fourth of the 754 cases in the distribution lie. 
To identify this point below which 188.5 cases are found, an inspection 
of the table reveals that the first quartile falls in the interval containing 
the 108 men who are nineteen years of age. Identification of the exact 
point may be more apparent if this interval is magnified as follows: 


sa 79.5 Cases Above Q, 


108 Cases 1st Quartile 


28.5 Cases Below Q, 


äi 


19 


There are 160 men who are younger than nineteen. Therefore, 28% 
Cases (188.5 — 160) of the 108 cases lying in the nineteen-year interval 
are needed. The first quartile, then, must lie at some point between nine- 
teen and twenty years below which 28.5 cases will fall. If these 108 cases 
are equally distributed through the interval, and this assumption is made, 
it follows that the first quartile will be 28.5/108 of the distance from the 
bottom of the interval toward the top. Since 28.5/108 of one year is equal 
to 0.264 year, it follows that the first quartile lies at 19.264 years. Of 
Course, this may be reduced to years and months, in which case the age 
at the first quartile would be approximately nineteen years and three 
months, ba 

To compute the third quartile, it is necessary to find that point in the 
distribution below which three-fourths, or 565.5 cases are found. An in- 
Spection of the table reveals that the third quartile lies somewhere in the 
interval from twenty-three to twenty-four years of age. Eleven and five- 
tenths (565.5 — 554) of the 72 cases in this interval are needed, after 


entering this interval. That is, 


i 11.5 a 
Q; = 23 years + (E of one year) = 23.16 years 


To find the seventh decile, that point in the distribution must be iden- 
tified below which 70% of the cases lie. 0.7 of 754 = 527.8. Thus, the 
Seventh decile is that point in the distribution below which 527.8 cases 

~ lie and is found somewhere in the twenty-two year interval. Then 


Dy = 22 years + E of one year) = 22.72 years 


46 STATISTICAL METHODS 


Percentiles are computed in a similar fashion. To find the fifty-fourth 
percentile, the point is identified in the distribution below which lies 54 
per cent, or 407.16, of the cases. 


Ps = 21 years + Le of one year) = 21.39 years 


As in the case of the calculation of the median of a frequency distribu- 
tion, the foregoing reasoning can be expressed in terms of a formula. To 
compute Qı, the formula would be 


where 
l = theoretical lower limit of the interval containing Q, 
fe = cumulative frequency below the interval containing Q, 
fo = frequency within the interval containing Qi 
N = total number of cases 
h = height of the interval. 

Therefore, 


10% _ 160 


Qı = 19 years + (E) (1) = 19.264 years 


This formula may be applied to find Qa, any decile, or any percentile pro- 
vided that the fraction E and the entries identifying 1, f. and fù are changed 


accordingly. For example, for Q3, the fraction would be aN, for Dy, a, 
54N 
for Pu, 100 + and so on. 


Computation of Percentile Ranks. Whereas a percentile is a point in a 
distribution, a percentile rank is a range of values in the distribution. 
The method of computing percentile ranks is readily illustrated. In Table 
13 is shown the distribution of raw scores of a mathematics placement 
test as tabulated for 3040 freshman students entering an engineering 
college. Each percentile rank is found by dividing the appropriate entry 
in the cumulative frequency column by the total number of cases, 3040, 
and then multiplying by 100. The result is always rounded upward to 
the nearest whole number. 


QUARTILES, DECILES, AND PERCENTILES 4? 


TABLE 13. Percentile Ranks of Mathematics Placement Test Scores 
Based Upon 8040 Students 
A A E 
CUMU- 


CUMU- 
LATIVE PERCEN- LATIVE PERCEN: 
RAW FRE- FRE- TILE RAW FRE- FRE- TILE 
SCORE QUENCY QUENCY RANK SCORE QUENCY QUENCY RANK 

193 1 3040 100 78 91 1450 48 
122 2 3039 100 77 83 1359 45 
pa 2 3037 100 76 77 1267 42 
e 3 3035 100 75 65 1199 40 
19 1 3032 100 74 68 1134 38 
ng 4 3031 100 73 67 1066 36 
SCH 7 3027 100 72 69 999 33 
ae 6 3020 100 71 53 930 31 
oe 4 3014 100 70 52 877 29 
ae 8 3010 99 69 65 825 28 
113 10 3002 99 68 46 760 25 
ua 9 2992 99 67 46 714 24 
u 1 2983 99 66 45 668 22 
to 13 2972 98 65 39 623 21 
109 14 2059 98 64 43 584 20 
108 5 63 42 541 18 
107 n Se H 62 39 499 17 
106 19 2920 97 61 39 460 16 
105 20 2901 96 60 40 421 14 
we 20 2881 95 59 33 3s1 18 
103 18 2861 95 58 29 348 12 
102 23 28 43 94 57 28 319 u 
101 27 2820 93 56 29 291 10 
100 30 2793 92 55 27 262 9 
99 36 2763 91 54 18 235 8 
98 40 2727 90 53 23 217 8 
97 39 2687 89 52 22 194 7 
98 38 2648 88 51 25 172 6 
95 49 2610 86 50 19 147 5 
94 E 4 sË 49 15 128 5 
93 A 48 16 113 4 
91 = ae e 46 12 82 3 
be 61 2341 77 45 ES pa 3 
89 59 2280 75 44 “4 S 

88 64 2221 ae $ 2 
37 67 2157 71 42 10 39 2 
¿AAA A 
e 74 2019 67 pa E TË 1 
73 1945 64 1 15 a 
Si 80 1872 ga Së 3 e 1 
3 zë ZS? 
Si 97 1617 54 Si A 4 1 


48 P STATISTICAL METHODS 


For a raw score of 34: 


P.R. = Séis (100) = 0.13 oi 


For a raw score of 35: 


PR: = 5055 (100) — 0.29 orl 


A continuation of this procedure yields all other percentile ranks. Tables 
containing raw scores and their corresponding percentile ranks are regu- 
larly used as a rapid means of converting additional raw scores on the 
same test to percentile ranks. 


Exercises 


The frequency distributions shown in the following table give the English test 
scores, the grade-point averages, and the ages in months of a group of college 
freshmen. The English test scores are reported according to the number of items 
correct, the grade-point averages to the nearest hundredth, and age as of the 
last whole month. 


ENGLISH TEST SCORE GRADE-POINT AVERAGES AGE 
INTERVAL FREQUENCY INTERVAL FREQUENCY INTERVAL FREQUENCY 

140-146 2 3.80-3.99 3 410-419 1 
133-139 0 3.60-3.79 5 400-409 0 
126-132 4 3.40-3.59 11 390-399 2 
119-125 7 3.20-3.39 29 380-389 2 
112-118 18 3.00-3.19 48 370-379 0 
105-111 35 2.80-2.99 63 360-369 4 
98-104. 68 2.60-2.79 115 350-359 8 
91-97 112 2.40-2.59 11 340-349 5 
84-90 187 2.20-2.39 203 330-339 6 
77-83 258 2.00-2.19 169 320-329 12 
70-76 179 1.80-1.99 137 310-319 17 
63-69 107 1.60-1.79 101 300-309 21 
56-62 64 1.40-1.59 46 290-299 27 
49-55 41 1.20-1.39 27 280-289 36 
42-48 23 1.00-1.19 14 270-279 43 
35-41 9 0.80-0.99 a 260-269 49 
28-34 5 0.60-0.79 2 250-259 55 
21-27 1 0.40-0.59 1 240-249 97 
14-20 3 0.20-0.39 1 230-239 130 
220-229 146 
210-219 172 
200-209 198 
190-199 78 
180-189 14 


QUARTILES, DECILES, AND PERCENTILES 


. a. Compute the lower quartile for the English test scores. 
b. Compute the upper quartile for the English test scores. 
. a. Compute the seventh percentile for the English test scores. 
b. Compute the eighty-fourth percentile for the English test scores. 
. Compute the nine decile points for the grade-point averages. 
a. Compuie the sixteenth percentile for the grade-point averages. 
b. Compute the sixty-seventh percentile for the grade-point averages. 
. a, Compute the fifty-first percentile for the ages. 


49 


4 


Measures of Variability 


There is a constant need for characterizing a distribution in some other 
way than by describing it in terms of some point along a scale of values. 
Often a description of the spread of values is desired. It is quite con- 
ceivable that two different distributions having the same mean would 
differ from each other with respect to the spread of the values occurring 
in the distribution. In one distribution, all scores may be the same value. 
In this case, the mean must necessarily be this one value. On the other 
hand, a second distribution may have the same mean value with the 
scores scattered widely away from the mean. 

In one school the mean score from the administration of a test was 
55. From this mean value alone it is not possible to tell whether all the 
pupils obtained a score of 55 or whether the range of scores is wide with 
a mean value which falls at 55. It follows, therefore, that some measure 
is needed which will characterize the spread of the values appearing in 
the distribution. Variability means the amount of spread or the amount 
of scatter prevailing from some measure of central tendency. Thus, the 
variability is small when the scores are bunched, and it is large when the 
scores are widely dispersed around some measure of central tendency. 
The measures of variability are sometimes called measures of dispersion. 

There are five measures of variability which are widely used, viz., the 
range, the quartile deviation or semi-interquartile range, the mean or 
average deviation, the standard deviation, and the coefficient of variation. 
The appropriate measure to compute depends upon the interpretation 
which is to be drawn from the data, just as the appropriate measure of 
central tendency depends upon the description to be made. The recogni- 
tion of appropriate and inappropriate interpretations of each of these 
measures constitutes an important place in the background necessary for 
the proper understanding of educational and psychological literature, 


THE RANGE 


The simplest measure of variability is the range. It is the difference 
between the highest and lowest scores in a distribution. If a distribution 
50 


MEASURES OF VARIABILITY 51 


should be plotted, the range becomes the distance from the lovvest score 
to the highest score. According to a strict interpretation of the definition 
of variability, it is doubtful whether the range is actually a measure of 
variability. It does not give any idea of the nature of the spread in a 
distribution around the central tendency. However, it does give some 
indication of the spread of scores. If a distribution is plotted, and the 
curve is erased, leaving only the base line, the length of this base line 
is the range. Obviously, from this base line alone there is no indication 
concerning the shape of the distribution. This confirms the generally 
known principle that the range gives no indication concerning the loca- 
tion of any values in a distribution except two—the highest and lowest 
values. No indication is given as to the degree of concentration of the 
values within the range, or as to the part of the range where this con- 
centration occurs. These values may be equally spread from the lowest 
value to the highest—or arranged in any other manner. So long as the 
lowest and highest values are unchanged, the range remains unchanged. 

In most types of distributions the lowest value and the highest value 
are subject to large fluctuations, if these values are estimated from suc- 
cessive samples drawn from a distribution. Thus, the range 1s a highly 
unstable measure of variability. At best, the range conveys only the 
crudest characterization of the variability which occurs in a distribution. 
The range does have the advantage that its meaning is evident to every- 
one. 


THE QUARTILE DEVIATION OR SEMI-INTERQUARTILE 
RANGE 


The quartile deviation is one-half the difference between the values 
at the first and third quartile. Whenever a distribution is plotted, the 
quartile deviation becomes one-half the distance between the first and 
third quartiles. The quartile deviation is synonymous with the semi- 
interquartile range. The latter term has the advantage that the term it- 
self defines its meaning and the disadvantage of being cumbersome. The 
former term has the disadvantage that it is easily confused with quartile. 
The quartiles are points on the base line of a distribution, whereas the 
quartile deviation, like other measures of variability, js a certain distance 
along the base line. Furthermore, this confusion is augmented by the 
Similarity in the commonly used symbols designating these measures. 
The quartile deviation is generally denoted by Q, the first quartile by 
Qı, and the third quartile by Qs. Some writers denote quartile deviation 
by Q.D., presumably for the purpose of avoiding confusion with Qi 
and Qs. 

The quartile deviation, 
distribution, viz., Qi and Qs. 5 
this respect, since the two values whic 


like the range, represents only two values in a 
It has some advantage over the range in 
h determine the measures are much 


52 STATISTICAL METHODS 


more stable. The less stable measures in a distribution ordinarily appear 
at either end. The lowest one-fourth of the cases in a distribution deter- 
mine the position of Q,, and the highest one-fourth of the cases determine 
the position of Qa. It follows that 50 per cent of the cases lie between 
Qı and Qs. The quartile deviation, by definition, becomes one-half of 
the range of the middle 50 per cent of the distribution. Thought of in this 
manner, the quartile deviation is a measure which is easily understood. 
When a distribution is plotted, the mean appears as a point on the base 
line. If a distance equal to the quartile deviation is laid off on both sides 


Frequency 


. Frequency 


x š 
Fic. 13. Ordinates at Quartile Distances from the Mean. 


of the mean, the number of cases included between ordinates erected at 
these points will vary with the shape of the distribution. In Figure 13 are 
shown two distributions where ordinates have been erected at the ends 
of quartile deviation distances from the means. In a normal distribution 
exactly 50 per cent of the cases will be included, since the quartile 
deviation distance below the mean will fall exactly at Oh, and above 
the mean, exactly at Q3. It is surprising how nearly 50 per cent (not 
necessarily the middle 50 per cent) of the cases will fall within the limits 
of the quartile deviation from the mean, even when the distribution is 


moderately skewed. 
Computation of the Quartile Deviation. No difficulty is experienced in 


MEASURES OF VARIABILITY 53 


‘the computation of the quartile deviation or the semi-interquartile range, 
which, by definition, is equal to one-half of the difference between the 
first and the third quartiles. The formula is 

Q = Qs E Qi 
Thus, if the first and third quartiles in a distribution of test scores are 
12 and 46, respectively, the quartile deviation is obtained by substituting 
these values in the formula and is found to be 17. 


= 1% 
ot = W 


THE MEAN DEVIATION, OR AVERAGE DEVIATION 


The mean deviation or average deviation of a distribution is the mean 
amount of the absolute deviation of all the individual values in a dis- 
tribution from the mean, or from the median. Suppose a distribution is 
made up of nine values as shown in the following data sheet. The mean 
of these values is 10. The value 14 has an absolute deviation from the 
mean of four points; 13, three points; 12, two points; 11, one point; 
10, zero deviation; 9, one point, and so forth. The mean of these indi- 


vidual absolute deviations is seen from the table to be 2.22. By definition, 
and the 


this is the mean deviation, or average deviation. Like the range 
quartile deviation, this measure has the advantage of being easily under- 
stood. Thus, on the average, the amount of the absolute deviation from 
the mean, or the median, of a distribution carries a definite meaning. 
Whereas, the determination of the range and quartile deviation involves 
the absolute magnitude of the values in the distribution only during the 
arrangement of the data, the computation of the mean deviation takes 
into account the size of each individual value occurring in the distribution. 


ABSOLUTE 
DEVIATION FROM 

e THE MEAN 
14 4 ` 
13 3 
12 2 
11 1 
10 0 
9 1 
8 2 
7 3 
6 SA 

i zizi _ 20 _ 

90 20 M.D. = Ey E 2.22 


54 STATISTICAL METHODS 


It should be noted that the sign of the deviation away from the mean 
is disregarded in computing the mean deviation. In other words, the direc- 
tion of the deviation above or below the mean has been disregarded. If the 
purpose for computing this measure is only for describing the distribution, 
this disregarding of signs is of no significance. However, there are other 
uses for a measure of variability, as will be shown later. For example, a 
measure of deviation is needed in computing a coefficient of correlation. 

It is a mathematical principle that a measure computed by disregarding 
signs cannot be further treated by mathematics. Consequently, the mean 
deviation when computed cannot be used as an intermediate product 
in the computation of other measures. Probably for this reason, the 
mean deviation is seldom computed. When it is used, the purpose is gen- 
erally to interpret to those unfamiliar with standard deviations the 
amount of variability in a distribution. The measure is sometimes esti- 
mated rather than actually computed. This estimation is obtained by 
taking four-fifths of the standard deviation. This procedure is valid only 
when the distribution is normal. 


THE STANDARD DEVIATION 


The standard deviation is the square root of the mean of the squares 
of the individual deviations from the mean of a distribution. It is obvi- 
ous from this definition that computation of the standard deviation, like 
that of the mean deviation, takes into account the size of every value 
in a distribution. In contrast to the mean deviation the standard deviation 
can be mathematically treated, since the signs of the deviations have 
not been disregarded. A distribution composed of nine hypothetical values 
is shown in the following table: 


SSeS 


DEVIATION FROM SQUARED DEVIATION 
VALUE THE MEAN FROM THE MEAN 
x. z x 
EE 

1 +4 +16 
13 +3 te +9 
12 +2 +4 
11 +1 +1 
10 0 0 
9 =o +1 
8 -2 +4 
7 —3 +9 
6 -4 +16 


3 

o 
E 
© 


It should be noted that the deviations involved in the computation of the 
standard deviation retain the sign indicating their position with respect 


MEASURES OF VARIABILITY 55 


to the mean. Values above the mean carry a postive sign and those below 
the mean a negative sign. It will be recalled that when either a positive 
or a negative number is squared a positive sign will ensue. Each of the 
squares of the individual deviations, therefore, will carry a positive sign, 
as shown in the third column. 

Since the mean of these values is 10, the deviations in the second 
column are the amounts the values differ from the mean. The sum of these 
deviations will be zero. The sum of the squared deviations is 60, and the 
mean of these squared deviations is 6.67. The square root of 6.67 is 2.58. 
The standard deviation, then, is 2.58. 

The idea behind the standard deviation can be summarized as follows: 
Signs should not be disregarded in the computation of a measure of 
variability; therefore, the sign is retained. It is an algebraic principle 
that both positive and negative numbers are positive when squared; 
consequently, by squaring these numbers, negative deviations disappear 
without disregarding their signs. The mean of these squared deviations is 
computed in order to find out, on an average, how much the squared 
deviations differ from the mean. The value found from a mean of these 
squared deviations will be much larger than can be expected from the 


size of the values in the distribution. In order to bring this value back 
to a size which is in line with the numerical values in a distribution, the 
d deviations is found. This, then, 


square root of the mean of the square 
constitutes the standard deviation. 


of a set of numerical data has many importan eri h 
utilized in the testing of hypotheses. These characteristics of variance 
and their uses will be discussed in later chapters. Variance, of course, 
can be changed to standard deviation by extracting the square root. The 
“Standard deviation is designated by the Greek letter sigma (5). In fact, 
the term standard deviation and the sigma are used interchangeably 
when referring to the standard deviation of a set of data considered to be 


2 population. 8 
When the standard deviation 


of certain size. These relations 
Mg and subtracting the standar 
limits of a sigma distance on ea 
Per cent, or two-thirds, of the 


lengths ide of the mean, app f tl 
will be E sf all the cases will be found within three 


sigma lengths from the mean. This relationship varies somewhat as the 


56 STATISTICAL METHODS 


distribution becomes skewed. However, little change is made in the num- 
ber of cases appearing at different sigma lengths even when there is some 
departure from the normal distribution. 

The relationship between the standard deviation of normally distrib- 
uted data and the number of cases between values of the distribution 
which have certain sizes may be illustrated by an example involving the 
height of adult men living in a community. 


5 ft. 9 in. 
2 in. 


Mean 
Standard deviation 


The principle is recalled that, in a normal distribution, approximately 
two-thirds of the cases lie between a standard deviation distance below 
the mean and a standard deviation distance above the mean. One stand- 
ard deviation distance below the mean will extend to 5 ft. 7 in. and one 
standard deviation distance above the mean will extend to 5 ft. 11 in. 
If it can be assumed that the heights of the male adults given in these 


data follow a normal curve, approximately two-thirds of the men range 


in height between 5 ft. 7 in. and 5 ft. 11 in. Laying off two sigma dis- 
tances above and below the mean, it follows that approximately 95 per 
cent of these men are between 5 ft. 5 in. and 6 ft. 1 in.; laying off three 
sigma distances, it is a practical certainty that the heights of all men 
will lie between 5 ft. 3 in. and 6 ft. 3 in. 

The standard deviation is the most widely used measure of variability. 
Although this measure is somewhat more difficult to understand than the 
measures of variability previously discussed, its use seems to be justified 
because it may be used in later calculations. 

Computation of the Standard Deviation. Two methods will be presented 
for computing the standard deviation, one being employed with un- 
grouped data and the other with data grouped into a frequency distribu- 
tion. 

The standard deviation may be obtained by (1) computing the mean, 
(2) subtracting the mean from each value in the distribution, (3) squar- 
ing each deviation, (4) summing the squares of the deviations, (5) divid- 
ing this sum by the number of cases, and (6) extracting the square root 
of the quotient. These six steps, obviously, will produce the standard 
deviation, since it is defined as the square root of the mean of the squares 

of the deviations from the mean. Although results are obtained from 
following the foregoing six steps which make clear the meaning of the 
standard deviation, the procedure becomes unwieldy when applied to 
most distributions. The two methods described in the following para- 
graphs have been developed in order to produce the same result with 
abor. 
ien for Ungrouped Data. The formula for standard deviation 
follows directly from its definition and may be written as follows: 


a u au 


sa: PBS 


> 
= = = 


ee 


> 


+ 


var 


MEASURES OF VARIABILITY 57 


Za? 
c= NT 


where a: = score in deviation form, ie., X —X 
N = number of cases in distribution 

From the previously deseribed definition of the standard deviation, it 
will be recalled that Sa” can be obtained by algebraically subtracting the 
mean from each value in the distribution, squaring and then summing 
these deviations. When the data are ungrouped, as will ordinarily be the 
Case, it is far more convenient to obtain the Ea? from the raw scores as 
is shown in the following derivation: 


yon! 
N 


a? = EX” 
Let X = raw score 
Th X = mean dë 
en ees xX — X by definition of a deviation score) 
and > 2-3 
x ee E Aë 
Umming pyt = EX? — 2EZX + NX @ 
, but 2x 
and 5 N 
NX = 2X ` 
S NX? = N2X 
ubstituting YEX for NX? in equation (a) 
za” = E) 2—2 X3X + X2X 
3x2 = EX” — Keck? (6) 
but 2X 
ESN 


Substitot: 
bstituting zx for X in equation (b) 


ZE 
da? = 2X- N (2X) 
2X) 
Ir = SE e 
entity in the original 


pa 
q N 


By ; 8 
Y inserting the foregoing id formula, 


it m 
ay be written as 


58 STATISTICAL METHODS 
or 


c= 


1 ANEX rees 
y VN2X? — (2X) 


or by multiplying both the numerator and denominator under the radical 
by N 


2 
By substituting in either of these formulas the standard deviation of 
ungrouped data can be found. The standard deviation of a distribution 
of 10 scores in the following data sheet has been computed by means 
of the equivalent formulas using raw scores. 


x Ke COMPUTATIONS 
6 36 S X) Y-2 DI _ 
7 49 2a? = TX? — N Ze e 075 
pH 2 
3 ER SES — E Xi = 62 = 36 
4 16 e= N 
5 25 (60)? - KE 
6 36 al 30 N 
7 49 = 10 
Total 60 390 = , [30 = PO _ 36 
10 10 
= 1.73 = 1.73 


It is apparent that both approaches have yielded mathematically iden- 
tical results. 

To this point, consideration has been given only to the calculation of 
the standard deviation of a distribution of scores which has not been 
considered a part of a larger distribution. In another chapter it is 
pointed out that investigators frequently wish to consider the set of 
measurements with which they are working as a sample out of a larger 
population. In this case c represents the standard deviation of the popu- 
lation and as such cannot be calculated from the information at hand. 
However, o can be estimated from the sample data by means of the 
formula 

za 
N-1 
where s is the estimate of the standard deviation of the population. It 
should also be pointed out that if the variance of the population is to be 
estimated from a sample of that population, the formula 


= 


vë 


MEASURES OF VARIABILITY 5%) 
Zr 


can be used. 

Computation for Grouped Data. In some cases in which the standard 
deviation is desired, the data are already assembled into a frequency 
distribution. The formula for computing the standard deviation in a 
frequency distribution is 


Tf the intelligence quotients of 167 high school pupils used in a preceding 
chapter are transferred to the following form, the procedure to be fol- 
lowed should be evident: 


1.Q. INTERVALS f d fa fë 
145-149 1 8 8 64 
140-144 3 7 21 147 
135-139 5 6 30 180 
130-134 8 5 40 200 
125-129 11 4 44 176 
120-124 17 3 51 153 
115-119 21 2 42 si 
110-114 22 1 22 22 
105-109 24 0 00 00 
100-104 20 Kal —20 20 

95-99 15 -2 —30 

90-94 12 —3 —36 108 
85-89 6 —4 —24 96 
80-84 2 —b —10 50 
Total 167 138 1360 


Substituting in the formula 
JETES 2 (BY - 
Se E es — (167) = 13-68 


THE COEFFICIENT OF VARIATION 


_ The coefficient of variation is the per cent which the standard deviation 
is of the mean. In other words, it is the standard deviation divided by 
the mean and multiplied by 100. Tt should be noticed that the coefficient 
of variation, by definition, can scarcely be called a measure of variability. 
Variability, in its exact meaning, refers to the degree with which values 
cluster around some measure of central tendency. Obviously, the ratio 
between the standard deviation, a true measure of dispersion, and the 
mean does some violence to the usual concept of variability. 

The size of a standard deviation depends upon the unit in which it is 


60 STATISTICAL METHODS 


measured. The purpose of the coefficient of variation, it would seem, is to 
change the numerical amount of the standard deviation so that it is not 
a function of the size of the unit employed. An example may make this 
meaning clear. Suppose that the mean weight of a group of high school 
boys is reported to be 120 pounds, with a standard deviation of 15 
pounds. These same data might as well have been reported: mean weight, 
1,920 ounces, standard deviation 240 ounces. Obviously, the standard 
deviation of 15 pounds cannot be compared directly with the standard 
deviation of 240 ounces because they are expressed in different units. 
If the coefficient of variation of the data expressed in pounds should be 
computed, a value of 12% is obtained. Likewise when the coefficient of 
variation of the data expressed in ounces is computed, a value of 1214 
is obtained. It can be seen in this example that the coefficient of variation 
accomplishes the purpose for which it was intended. It may be pointed 
out that this same purpose might be accomplished in a much simpler 
manner by dividing the standard deviation expressed in ounces by 16 in 
order to obtain the standard deviation in pounds. 

A generalization has been made, consciously or unconsciously, from 
examples similar to the foregoing one, that the coefficient of variation 
automatically reduces the standard deviation of different distributions to 
units that may be compared directly. From this generalization, some 
textbooks in statistics have attempted to compare the variability of 
group achievement in two different subject areas, as for example, in 
algebra and in English. Desirable as such a comparison may be, it is not 
at all possible to make it from the results obtained by computing coeffi- 
cients of variation. In situations of this type, obviously, the generaliza- 
tion does not hold. 

As a further indication that such a generalization is not necessarily 
true, consider the hypothetical temperature data for one year when the 
daily temperatures are expressed first in Fahrenheit degrees and then in 
centigrade degrees. If these data are the same, the coefficient of variation, 
if appropriate for the purpose intended, must give identical values when 
applied to the Fahrenheit and centigrade data. Obviously, this is not the 
case. The usual interpretations of the coefficient of variation would sug- 
gest that the variability of the temperature is twice as great in one case 

as in the other. Naturally, such an interpretation is not sound. 

It may be pointed out that this discrepancy is due to the fact that the 
zero point of temperature does not coincide with the zero points on 


——— gg ————— 


FAHRENHEIT CENTIGRADE 
Mean 59 15 
Standard deviation 18 10 


Coefficient of variation 30.5 66.7 


MEASURES OF VARIABILITY 6l 


cither the centigrade or Fahrenheit scale and that, furthermore, had the 
means been computed from the actual absolute zero points, the coefficients 
of variation found would have been identical. However, neither does the 
zero point of scales used to measure achievement coincide with the zero 
point of no achievement. The coefficient of variation can be made a sound 
procedure for treating scores in different subjects whenever the value 
in each subject is expressed in units above absolutely no achievement. 
Since the point of no achievement is entirely unknown, there is no other 
alternative except to discontinue the use of the coefficient of variation 
for the interpretation of variability of achievement. 

It is probable that, in the final analysis, the only times the coefficient 
of variation can be interpreted as relative variability are (1) when the 
zero point of the scale coincides with the zero point of the characteristic 
measured by the scale and (2) when the difference in the standard devia- 
tion in the two distributions results from the differences in the size of the 
units employed. In this latter case, all that js necessary to be done is to 
transfer one standard deviation into the same units in which the other is 
expressed. When this is done, the relative variability can be compared 
directly. 

STANDARD SCORES 

The si made by any given student in an examination isa function 
of the Sc? of the Mer and the average difficulty of the items. 
In some cases interpretation of results 1s facilitated when these scores are 
expressed in units which are not functions of test length or SE be 
The most frequently employed unit for this purpose is a standard devia- 


x 
i > T standard score. Thus, an 
tion unit above or below the mean, called an > J 


É i be interpreted as a 
“standard score of 0.42 made by a given student can be interp 


Score 42/100 of a standard deviation distan 
—2.00 as a 
of his fellows, and so forth. 

s directly from the 


ce above the mean score made 


two standard devia- 
by a group; an “standard score of score 
o 


tion distances below the mean score 
T standard scores follow: 


The formula for obtaining S 


definition and may be written 


Wher e 


an T standard score 
o 


x 

7 

X = a test score 

X = mean of the test scores 


62 STATISTICAL METHODS 


If an “standard score should be desired for a student making a score of 60 
ina ae having a mean score of 70 and a standard deviation of scores 
of 20, substitution in the formula will produce —0.5 as the Z standard 
score for this student. 


— S= =O 


20 


In a similar way, standard scores may be obtained for each of the stu- 


z_ 60—70 
o 


dents comprising the group. 

There are three considerations which should be recognized in the in- 
terpretation of standard scores. The first consideration is that the unit of 
measure utilized in standard scores is constant throughout the range when- 
ever the characteristic or trait follows a normal curve in the group con- 


sidered. For example, in a given set of standard scores, the difference 


between the scores of 1.0 and 1.5 represents the same number of raw score 
units as the difference between the scores of 2.0 and 2.5, or a 0.5 difference 
in any other part of the scale. It will be recalled that this circumstance 
does not prevail for percentile values even though they are obtained from 
normally distributed raw scores. 

The second consideration is that standard scores are relative values 
referring only to the distribution utilized in the computation. For ex- 
ample, if an algebra, test is administered to groups of ninth-grade pupils 
in two cities and in each school standard scores are computed for each 
pupil, similar standard scores in the two cities may or may not represent 
similar achievement in terms of raw scores made on the test. 

The third consideration is that, if the mean standard score is obtained 
for an individual for more than one characteristic, each characteristic 
is weighted equally in the composite value. If a single value is desired 
which weights the characteristics according to some judgment of relative 
importance, then the separate standard scores must be weighted individu- 
ally in the computation of the mean. 

When a group of normally distributed raw scores have been converted 


to standard scores, the standard scores will range from about —3 to 
o 

about +3, their mean will be zero, and the standard deviation unity. 

Since the small numerical size of Z standard scores and the presence of 


negative signs sometimes make them awkward to handle, standard scores 
with larger ranges, means, and standard deviations have been developed 


by converting the Z standard scores. The expression 10 6) + 50 will 
O 


MEASURES OF VARIABILITY 63 


change standard scores to standard scores with a mean of 50 and a 
standard deviation of 10. These scores will range from about 20 to about 
80. The expression 20 (E) + 100 will change standard scores to standard 


scores with a mean of 100, a standard deviation of 20 and a range from 
approximately 40 to 160. By following this pattern, standard scores with 
means and standard deviations of convenient sizes can be determined. 


THE NORMAL DISTRIBUTION CURVE 


Tf an infinitely large number of measurements were made of a character- 
istic which is normally distributed and these scores were converted to 


x fetes 
Standard scores, the distribution of the standard scores would, of 


course, also be normal. The ensuing normal curve, shown in Figure 14, 


Frequency 


fe PA 
-3 —2 -1 0 +1 +2 +3 


Fic. 14. The Normal Distribution of = Standard Scores. 


ch to describe a highly useful table 


provides a convenient basis upon whi 
he Ordinates and Areas of the 


of normal curve values. Such a table is t 
Normal Curve shown in the Appendix. 
The values in this table are based upon an area of unity and a standard 
deviation of unity for the normal curve. The entries for the table have 
been found by successive solution of the equation of the normal curve.t 
The normal curve table is composed of three parts, viz., the positions 


1 The equation for the normal curve is: 


where z is the height of the curve above the base line at any point along the base line. 
score deviation from the mean 

number of cases in the distribution 

standard deviation of the distribution 

3.1416 

2.7183 


239 28 


unan 


64 STATISTICAL METHODS 


on the base line expressed in terms of Z standard scores,! the height of the 


ordinate at a given base line position, and the area under the curve en- 
closed by any two ordinates. The center of the base line, i.e. the point at 


which z = 0, is used as the beginning point for the table. At this point, the 


ordinate height is 0.3989, and the area under the curve between this ordi- 
nate and the ordinate located at the starting point is, of course, zero. When 


2 is 1.34, for example, the height of the ordinate is 0.1626 and the area 
T 


under the curve between the ordinate at this point and the ordinate at 
the starting point is 0.4099, as shown in Figure 15. In a similar manner 
all other values in the table can be interpreted. 


Frequency 


Fia. 15. Area Under the Normal Curve Between Two Ordinates. 


Because the normal curve is symmetrical, the tabled values are given 
for only one-half of the curve. Thus the area values, for example, range 


from zero to practically 0.5. Negative values of z would indicate base line 
positions below the mean and would have ordinate values and area values 
identical with corresponding positive values of z. Therefore the table can 


be readily applied to any part of the normal curve. 

A simple illustration of the use of the normal curve table can be ex- 
plained by referring to Figure 14. If one standard deviation distance is 
laid off on the base line on each side of the mean, ordinates located at 
these two points will strike the normal curve at the points of inflection, 
i.e., at the points in the curve at which it changes from concave to convex. 
Since the area under the normal curve represents the number of cases, 


the fact that approximately two-thirds of the standard scores lie between 


1 When considered as a part of the table of normal curve, these values are sometimes 
referred to as sigma distances or as normal deviate values. 


MEASURES OF VARIABILITY 65 


the two points in question, z = +1 and z = —1, should be identified in 
the area column of the normal curve table. Since the area under the curve 
between the ordinate at = = +1 and the ordinate at z = 0 is 0.3413, then 
the area under the curve between the ordinate at z = — Land the ordinate 
at * = 0 is also 0.3413. Therefore the total area under the curve between 
T 
the ordinate at E = +1 and T _ _1 is 0.6826. In other words, in the dis- 
O O 
tribution of standard scores, 68.26 per cent of the scores will fall between 
T 
== +1 and E = —1. Ina similar manner it can be shown that 95.44 per 
E o 


d T 
cent of the Y standard scores will fall between z = +2 and aS —2. 
O 


Frequency 


Sa 


Fig. 16. Proportions of the Area Under the Normal Curve Determined by the 


Ordinate at = = —05. 
A further demonstration of the use of the normal curve table can be 


found in the conversion of X standard scores to percentile ranks. This con- 
T 


version is appropriate only when the standard scores have been derived 


x 

from normally distributed raw scores. If, for example, an Standard score 
of —0.5 were to be converted to a percentile rank, the area under the 
normal curve found to the left of an ordinate located at së —0.5 indicates 


the corresponding percentile rank. To determine the size of this area, 
shown as the shaded area in Figure 16, it is first necessary to find the area 


under the curve between the ordinate at z = —0.5 and the ordinate at 


66 STATISTICAL METHODS 


TO. According to the normal curve table, the latter area is 0.1915. Hence 
Gë shaded area equals 0.5000 minus 0.1915, or 0.3085. Since percentile 
ranks are rounded upward, the Z standard score is equivalent to the 
dhirty-first percentile rank. 

Converting an Z standard score of +1.2 to the corresponding percentile 
rank can be explained by means of Figure 17. In this instance the area to 
the left of the ordinate located at the Z standard score in question is greater 
than 0.5000. Obviously this shaded area is greater than 0.5000 by the 


amount of area under the curve between the ordinate located at ` =12 


Frequency 


Fic. 17. Proportions of the Area Under the Normal Curve Determined by the 
Ordinate at = = 12, 


and the ordinate at z = 0. Since the table value of the area is 0.3849, the 


size of the shaded area is 0.8849. Therefore the standard score of +1.2 


is equivalent to the eighty-ninth percentile rank. 


By reversing the process, percentile ranks can be changed to standard 
scores, although only approximate standard scores result. For example 


a percentile rank of 54 is equivalent to an standard score of 0.0754 to 


0.1004 inclusive. Since percentile ranks, when computed, are always 
rounded upward, such approximations are to be expected. It should be 
noted that the foregoing values of 0.0754 and 0.1004 can be most readily 


found in a modified table of the normal curve such as the table of ` ordinates 


MEASURES OF VARIABILITY 67 


anl eunis of the normal curve shown in the Appendix. The percentile 
rank to be converted is entered in the column labeled p of the table. 
Further uses of the table of normal curve are presented in other chapters 


a ; x E E 
s the occasion demands. For instance, one of the more obvious applica- 


tions of the table, i.e., the determination of whether a given distribution 
s described in the chapter 


ba values differs from a normal distribution, i 
Gëtter to chi square. Other important applications are developed in the 
iscussion of discriminant analysis. To facilitate the use of the table of 
normal curve, modifications of it have been constructed, one of which has 
already been mentioned. Shown in the Appendix are tables containing 

2 . . H 
other entries such as PI, and SË One of the applications of z is readily 

z 

which descriptive units are to 


demonstrated in the following discussion in 
the characteristic is normally 


be quantified under the assumption that 
distributed. 


SCRIPTIVE UNITS 
Often data are expressed in deseriptive units. When data are so ex- 
Pressed, it is impossible to fin or relationships unless 
Some technique is employed i iptive units to numerical 
Units. The necessary transformation is made by making some assump- 
tion relative to the shape of the distribution of the characteristic, or trait, 
in the population for which ratings are available. Common practice sug- 
gests the assumption of a normal curve. It is possible, however, to assume 
A distribution of any shape. The advantage in assuming a normal dis- 
tribution lies in the accessibility of tables for making the necessary 
tion requires the 


transformation. To assume Some other type of distribu 
ing ordinate heights, or areas under 


development of a formula for chanë o 
à curve, into corresponding base-line units. Fortunately, in most Cases, 
little relative difference is found when numerical units are substituted 
for descriptive units whether computed by assuming 2 normal curve or 
Y assuming some other shape of distribution which appears more logical. 

. An example showing the type of situation in which some technique 
Is necessary for changing descriptive units into numerical units may be 
s which have been assigned to 


helpful . > 
DK ality rating : A 
A ee n 4 teachers college are given. Inspection 


S DH 
ot by five instructors 1 A gees? 
is table indicates, as might De expected, b 
tendency to rate higher or lower than others. Thus, instructor C shows a 
fendency to rate lower and instructor E to rate higher than the other 

ree instructors. 

je) is obvious that an excellent ra 
"ëm therefore, on an average, e 
n a similar rating from instructor E- i f 
9 obtain a SC of the personality ratings of the five instructors, it 


t to obtain from instructor 


ting is difficul i 
should receive & higher numerical score 


68 STATISTICAL METHODS 
TABLE 14. Personality Ratings of Students (Descriptive Units) 
INSTRUCTOR 

STUDENT A B c D E 
T Good Good Average Good Excellent 
2 Average Good Poor Good Good 
3 Excellent Excellent Excellent Excellent Excellent 
4 Poor Poor Poor Poor Average 
5 Average Excellent Good Average Excellent 
6 Good Good Good Good Good 
7 Excellent Excellent Average Good Excellent 
8 Excellent Good Average Excellent Excellent 
9 Average Average Poor Poor Average 
10 Poor Poor Poor Poor Poor 
11 Good Good Average Good Good 
12 Excellent Excellent Excellent Excellent Excellent 
13 Average Average Poor Average Good 
14 Poor Poor Poor Poor Poor 
15 Excellent Average Average Good Excellent 
16 Average Average Poor Average Good 
17 Average Average Poor Poor Average 
18 Good Good Average Good Good 
19 Good Average Good Good Good 
20 Excellent Good Good Good Good 
21 Good Excellent Average Average Average 
22 Average Good Average Good Excellent 
23 Good Good Average Average Good 
24 Excellent Good Excellent Good Excellent 
25 Excellent Good Poor Average Excellent 


is desirable to find numerical equivalents for the descriptive units as- 


signed by each instructor. The steps involved are: 
Të 


2 


3. 
4. 


Make some assumption concernin: 


ally, a normal curve is assumed.) 


with the lowest descriptive category on the left. 
Find the mean sigma distance for those cases appearing in each category. 


g the shape of the distribution. (Gener- 


. Compute the percentage of ratings falling in each category (excellent, good, 
average, and so on). 


Place these percentages under the normal curve, or other curve selected, 


If a normal curve is assumed, the procedure necessary to obtain numer- 
ical equivalents for ratings given by instructor A consists in finding the 
percentage of students who receive each rating. Instructor A’s distribution 
of ratings follows: 


Excellent 
Good 
Average 
Poor 


KIMIN 


32 per cent 


28 per cent 
28 per cent 
12 per cent 


MEASURES OF VARIABILITY 69 
The pe gi 
18, D reentages are placed under the normal curve, as shown in Figure 
To determine the 1 i 
o mean sigma distance for eac ti 
Se Perime re h ra ing category the fol- 


Mean Sigma Distance = E 
P 
21 = height of the ordinate on the left or lower extremity of the category 
za = height of the ordinate on the right or higher extremity of the category 


p = the proportion of the cases in the category 


Excellent 
gs by Instructor A. 


Poor Average Good 
Tio. 18. Personality Ratini 
B š £ e 
má consulting the column of z in the Appendix table, the heights of the 
ee ordinates which divide the distribution into four categories can be 
obtained. 
d Substitution of the appropriate values in th 
gma distance yields , 
for Excellent, 


e formula for the mean 


— 0.0000 
0.3576 — 0.0000 _ 4 4.1175 


Mean sigma distance — 0.32 
for Good, 
j 0.3863 — 0.3576 
Mean sigma distance — ODE A Be 


for Average, 
J 0.2000 — 0.3863 
Mean sigma distance = = o = — 0.6654 


for Poo 
r, 

0.0000 — 0.2000 _ — 1.6667 
0.12 


Mean sigma distance = 
for each instructor can be 


h instructor's rating are shown in Table 15. 
red, it can now be obtained by computing 
stances. It will be noted that computation 
judge equally. Some other 
pears more logical. Thus, if 


la E a 
like manner, the mean sigma distances 


determined. The values for eac 

S e composite rating is desi 

of o lean of the mean sigma dis 
wei us mean weights the ratings of each 

ghting, of course, may be utilized if it ap. 


70 STATISTICAL METHODS 
TABLE 15. Mean Sigma Distances for Descriptive Ratings 
DESCRIPTIVE MEAN SIGMA DISTANCE FOR RATING BY 
UNIT A B c D E 
Excellent 1.1175 1.4000 1.6667 1.6667 0.9658 
Good 0.1025 0.2139 0.8538 0.4418 —0.2094 
Average —0.6654 —0.7254 0.1042 —0.4767 —1.0138 
Poor —1.6667 —1.6667 —1.0392 —1.4000 —1.8588 


instructor C has much more time available for interviewing students than 
do other instructors, it might be desirable to weight his ratings by 2,3, 4, 
or some other number which is assumed to be satisfactory from logical 
considerations. A mean of the personality ratings shown in Table 14 in 


descriptive units is expressed in numerical units in Table 16. 


D 


TABLE 16. Personality Ratings of Students (Mean Sigma Distances) 
ooo hac _C 


STUDENT 


CONQO ANNA 


10 


A 


0.1025 
—0.6654 
1.1175 
—1.6667 
—0.6654 


0.1025 
1.1175 
1.1175 
—0.6654 
— 1.6667 


0.1025 
11175 
—0.6654 
— 1.6667 
1.1175 


—0.6654 
—0.6654 
0.1025 
0.1025 
1.1175 


0.1025 
—0.6654 
0.1025 
1.1175 
1.1175 


INSTRUCTOR MEAN 
B c D E RATING 
0.2139 0.1042 0.4418 0.9658 0.3656 
0.2139 —1.0392 0.4418 —0.2094 —0.2517 
1.4000 1.6667 1.6667 0.9658 1.3633 
—1.6667 —1.0392 —1.4000 —1.0138  —1.3573 
1.4000 0.8538 —0.4767 0.9658 0.4155 
0.2139 0.8538 0.4418 —0.2094 0.2805 
1.4000 0.1042 0.4418 0.9658 0.8059 
0.2139 0.1042 1.6667 0.9658 0.8136 
—0.7254 —1.0392 —1.4000 —1.0138 —0.9688 
— 1.6667 —1.0392 —1.4000 —1.8588 —1.5263 
0.2139 0.1042 0,4418 —0.2094 0.1306 
1.4000 1.6667 1.6667 0.9658 1.3633 
—0.7254 —1.0392 —0.4767 —0.2094 —0.6232 
—1.6667 —1.0392 —1.4000 —1.8588 —1.5263 
—0.7254 0.1042 0.4418 0.9658 0.3808 
—0.7254 —1.0392- —0.4767 —0.2094 —0.6232 
—0.7254 —1.0392 —1.4000 —1.0138 0.9688 
0.2139 0.1042 0.4418 —0.2094 0.1306 
—0.7254 0.8538 0.4418 —0.2094 0.0927 
0.2139 0.8538 0.4418 —0.2094. 0.4835 
1.4000 0.1042 —0.4767 —1.0138 0.0232 
0.2139 0.1042 0.4418 0.9658 0.2121 
0.2139 0.1042 —0.4767 —0.2094 —0.0531 
0.2139 1.6667 0.4418 0.9658 0.8811 
0.2139 —1.0392 —0.4767 0.9658 0.1563 


MEASURES OF VARIABILITY 71 


This transformation is made by utilizing for each descriptive unit the 
sigma score equivalents shown in Table 15. 

A word of warning is necessary concerning the desirability of finding 
the mean of mean sigma distances obtained from descriptive units re- 
ported by various judges. If it can be assumed that all judges are compe- 
tent to rate the characteristic or trait being evaluated, then the procedure 
undoubtedly is of value. On the other hand, no central tendency of ratings 
made by incompetent judges will produce accurate composite ratings. 


Exercises 


1. In the following table scores are shown which have been made by high 
school seniors in two counties on a high school senior examination: 


SE 


FREQUENCY FREQUENCY 

INTERVAL COUNTY A COUNTY B INTERVAL COUNTY A COUNTY B 

Sn ge ee AA 
162-167 1 3 90-95 20 20 
156-161 3 2 84-89 21 11 
150-155 1 5 78-83 21 8 
144-149 2 9 72-17 18 2 
138-143 3 9 66-71 11 4 
132-137 8 10 60-65 9 3 
126-131 12 15 54-59 10 1 
120-125 7 23 48-53 6 0 
114-119 13 25 42-47 8 1 
108-113 13 32 36-41 3 0 
102-107 9 25 30-35 2 0 
96-101 19 13 24-29 2 0 


a. Compute the quartile deviation of scores in each county. 
b. Compute the standard deviation of the scores in each county. 
c. With respect to scores on this examination, in which county are the 
seniors more variable? ; Me oe’ 
d. In each county what percentage of the seniors’ scores lie within one 
standard deviation above and below the mean? 
2. In Exercise 1, compute the T standard score of a senior who made a score 
` 
T 


of 63 in County A. 
3. In Exercise 1, what score corresponds to an standard score of 2.45 in 
County B? 


4. The following table gives the salaries o 
Southern states: 


£ men and women teachers in certain 


72 STATISTICAL METHODS 


FREQUENCY 
INTERVALS OF WOMEN MEN 
SALARIES TEACHERS TEACHERS 
$4,350-$4,549 1 3 
4,150- 4,349 0 8 
3,950- 4,149 4 19 
3,750- 3,949 10 25 
3,550- 3,749 12 34 
3,350- 3,549 29 72 
3,150- 3,349 57 94 
2,950- 3,149 112 116 
2,750- 2,949 238 167 
2,550- 2,749 479 228 
2,350- 2,549 684 142 
2,150- 2,349 935 85 
1,950- 2,149 874 47 
1,750- 1,949 325 12 
1,550- 1,749 28 3 


. Compute the mean salaries for men and for women. 
. Compute the quartile deviation of the salaries for men and for women. 
. Compute the standard deviation of the salaries for men and for women. 
. How would you describe the difference between the salaries for men and 
those for women in these states? 
e. What change would you expect in the standard deviations of salaries if 
the distributions consisted entirely of high school teachers? 
j. What change would you expect in the means of the salaries if the dis- 
tributions consisted entirely of high school teachers? 


5. In the following table reading readiness scores for 30 first-grade pupils are 
recorded: 


Lo a 


PUPIL SCORE PUPIL SCORE PUPIL SCORE 
1 16 11 17 21 87 
2 22 12 37 22 42 
3 77 13 93 23 17 
4 94 14 23 24 53 
5 39 15 78 25 31 
6 49 16 87 26 57 
7 54 17 35 27 24 
8 43 18 20 28 55 
9 54 19 96 29 47 

10 82 20 43 30 67 


a. Compute the mean deviation of the scores. 
b. Compute the standard deviation of the scores. 
c. What percentage of the pupils’ scores lie within one standard deviation 


above and below the mean? Suggest reasons why this value is or is not 
as large as you would expect. 


MEASURES OF VARIABILITY 73 


6. A group of high school pupils and parents was asked to judge 63 motion 
pictures that they had seen. They were to rate them as good, average, or poor. 
The pupils rated 45 motion pictures good, 16 average, 2 poor, while the parents 
rated 9 good, 24 average, 30 poor. 

a. Assume that the quality of these motion pictures follows a normal dis- 
tribution and compute the mean sigma distances for the pupils’ ratings. 
b. Compute the mean sigma distances for the parents’ ratings. 

7. A test in skill in the use of a microscope was developed which enabled an 
instructor to obtain a record of the time required for each student to find an 
object under the microscope, and an evaluation of the quality of the student’s 
mount. The instructor wished to get a single score for skill based upon the mean 
of the standard scores for speed and for quality. He found that the mean time 
required for the class was 145 sec.; the standard deviation was 35 sec.; the 
mean score for quality was 5; and the standard deviation was 1.7. The instructor 
wished to weight both objectives equally in obtaining the composite sigma score. 

a. Compute the single skill score for a student whose time was 2 min. and 


whose quality was 4. 
b. Compute the skill score for one whose time was 45 sec. and whose 


quality was 3. 
c. Compute the skill score for one whose time was 3 min. 15 sec. and whose 
quality was 7. 
Ratings for quality were si 
highest quality. 
8. What percentage of the 


ordinates located at the following T standard scores? 
O 


. 0.00 and 1.00 

. —100 and 1.00 

. —0.50 and 1.50 

. —1.96 and 1.96 

e. —0.67 and 0.15 

9. What percentages of the area 


and below ordinates located at the following “standard scores? 


cored on the basis of one to ten, ten being the 


area under the normal curve is found between 


Loa a 


under the normal curve will be found above 


a. 1.96 
b. —0.58 
c. 0.87 
d. 0.22 


5 


Coefficient of Correlation 


Measures of central tendency and variability are useful techniques for 
deseribing single distributions. There are many problems in education and 
psychology, however, which involve the determination of relationship, or 
association, between pairs of values in two distributions. For example, do 
children making high scores on intelligence tests receive high marks in 
school? What is the relationship between speed of reading and reading 
comprehension? Do children from “better” homes attend the motion pic- 
tures more or less frequently than do those from “poorer” homes? Prob- 
lems such as these are investigated by the method of correlation. 

It is evident in the preceding examples that measures of relationship 
are needed in order that comparisons between different distributions may 
be expressed in a manner which will be apparent. The data in any two 
distributions hold the answer to the relationship, or lack of relationship, 
between two characteristics. However, in most cases, this relationship 
will not be apparent from the unorganized masses of data. It is desirable, 
therefore, to obtain some single numerical value from the data which 
will permit a meaningful interpretation of the relationship existing be- 
tween the variables, 

The coefficient of correlation is a single value which is used to repre- 
sentthe relationship between two sets= of-data representing “continuous 
variables, which have been collected | for the same individual or which can 
be paired in some manner. In other words, it represents the extent to 
which ‘changes in one variable are accompanied by equal changes in 
another, or the degree to which the data when plotted fall into a straight 
line. Figure 19 illustrates the appearance of a two-way frequency dis- 
tribution. Points representing the various pairs of X and Y values have 
been plotted, producing the elongated pattern. It is apparent that a 
straight line may be drawn among these data which will, to some extent, 
characterize the existing relationship. The coefficient of correlation is a 
measure which indicates the degree to which these plotted values would 
fall into this straight line. If all values lie on a straight line, the relation- 

74 


COEFFICIENT OF CORRELATION 75 


ship would be said to be perfect and the coefficient of correlation would 
be 1.00. This value of the coefficient of correlation would be positive or 
negative, depending upon the direction of the straight line. If the high 
values of one variable are associated with the high values of the other, 
the correlation is said to be positive. If, on the other hand, the high 
values of one variable are associated with the low values of the other, 
the correlation is said to be negative. 

One of the underlying assumptions of the coefficient of correlation is 
that the relationship between the two variables being studied is linear. 
In Figure 20 it is clearly indicated that there is a close relationship be- 
tween the two variables plotted, although no straight line would indicate 
this relationship. It can be seen that values in the one distribution may 
be accurately predicted from values in the other, even though a straight 


x 


Fiq. 20. Curvilinear Relationship Be- 
tween Two Variables. 


Fra. 19. Scatter Diagram of Correla- 
tion Chart—Linear Relationship. 


line cannot be drawn to indicate this relationship. It is possible to com- 
pute a coefficient of correlation without plotting the distribution, but 
difficulties in interpretation are quite likely to ensue if such a procedure 
is followed. In the particular case of Figure 20, the coefficient of correla- 
tion when computed would be practically zero, although it is evident 
that there is a decided relationship present which is nonlinear in character. 

To assume without supporting evidence that any relationship between 
two sets of data can be represented by a straight line is dangerous indeed. 
Perhaps the most appropriate procedure for an investigator to follow is to 
plot the data and inspect the scatter of the plotted points. In most cases 
this will be sufficient evidence for assuming a linear or curvilinear rela- 
tionship. However, if there is some doubt about the linearity of the 
relationship, a more sensitive test of linearity to be described in a later 


chapter should be made. 


76 STATISTICAL METHODS 


One of the most important uses for a coefficient of correlation is that of 
indicating the extent to which values of one variable may be predicted 
from known values of another variable. Thus, a correlation of 0.50 gives 
some indication of the degree of accuracy which may be expected if 
average marks are predicted from scholastic aptitude test scores. The 
size of a coefficeint of correlation varies from — 1.00 to 1.00, i.e., from 
perfect negative correlation to perfect positive correlation. When there 
is no linear relationship present, the coefficient of correlation is zero. 

Some textbooks in statistics arbitrarily rank certain correlations as 
high, marked, and low. Such an arbitrary ranking is open to serious ques- 
tion, since the size of a coefficient of correlation can scarcely be consid- 
ered apart from the purpose for which it is computed. For example, a 
coefficient of correlation of 0.40 between scholastic aptitude test scores 


x 


Fia. 21. Correlation Chart for Grades Fre 22. Correlation Chart for Grade 8. 
5, 6, 7, and 8. 


and course marks by no stretch of the imagination can be construed as 
high for the purpose of predicting the academic achievement of an indi- 
vidual. On the other hand, if the purpose were to predict the academic 
achievement of a group, a coefficient of correlation of 0.40 would be ex- 
tremely high. 

Furthermore, the size of a coefficient of correlation in itself is insuffi- 
cient to indicate the extent to which one variable may be predicted from 
another. It constitutes only one of the factors which must be considered 
in this respect. A knowledge of the variability of the group in which the 
correlation is computed is of equal importance with a knowledge of the 
size of a coefficient of correlation. For example, in Figure 21 is shown 
the relationship between scores made on a standardized test in grades five 
through eight and scores made on a scholastic aptitude test. The Y-values 
are designated as standardized test scores, whereas the X-values are desig- 
nated as scholastic aptitude scores. The coefficient of correlation, in this 


4 


COEFFICIENT OF CORRELATION 77 


case, is approximately 0.70. In Figure 22 the same data are shown, except 
that grades five, six, and seven have been deleted. It should be noted 
that not all the eighth-grade pupils would fall within the small square 
in Figure 21, nor would all other pupils be excluded from it. Hence, 
Figure 22 should not be considered as a duplication of the small square, 
but an indication of the general location of the majority of the eighth- 
graders. In Figure 22 it is apparent that the correlation has been lowered 
a great deal, yet it is possible to predict the scores of the eighth-grade 
pupils on the standardized test just as accurately from the scholastic 
aptitude test scores when the correlation is 0.40 as it is to predict the 
same scores when the correlation is 0.70 with all four grades included. 

An important characteristic of a coefficient of correlation is that it may 
be used for making comparisons between variables expressed in different 
units. For example, it is perfectly possible to correlate heights with 
weights, i.e., pounds with inches, scores on intelligence tests with heights, 
earliness of walking and size of later vocabulary, and so on. The compu- 
tation of this measure automatically forces the values into units which 
are comparable. On the other hand, the computation also automatically 
disregards the mean value of each variable. For example, suppose that 
a correlation of 0.60 has been found between the weight and height of 
high school pupils. If all pupils have been reported 20 pounds too heavy, 
there will be no change in the coefficient of correlation if recomputed 
using the corrected weight. 

A common misconception of the statistically untrained person about 
a coefficient of correlation lies in his tendency to interpret this value as 
a percentage. The coefficient of correlation cannot be interpreted as the 
Percentage of perfect relationship existing between two variables, or as 
the percentage of accuracy with which one variable may be predicted from 
another. Desirable as it would be to have some measure which could be 
interpreted in this manner, the coefficient of correlation, because of the 
very manner in which it is computed, cannot be so interpreted. 

The most serious misconception of the coefficient of correlation lies in 
the belief that it indicates a cause-and-effect relationship. For example, 
suppose that a coefficient of correlation of 0.70 is found between student 
marks in algebra and student marks in physics. The interpretation is 
often made that a good mark in physics is the result of the student’s 
having done good work in algebra. Although this interpretation may be 
correct, the coefficient of correlation by no means proves it. Ta e? 
cient of correlati ‘ely indicates that for some reason people who do 

rrelation mere y ood work in the other. The cause 
good work in the one subject tend to dog 

IN no way indicated. “ty of interpreting cause and 

Sieg pan a kt ES: Ge a period of a 

elationship from the coe E 
Year there is a high degree of correlation between 


D 


78 STATISTICAL METHODS 


pavement and death rate of infants. Does degree of hardness of asphalt 
paving cause differences in the death rate of infants, or does the varying 
death rate of infants cause the pavement to become soft and hard 
alternately? 

In the foregoing example, it is obvious that the causes of the relation- 
ships are not made explicit in the relationships themselves. It is not so 
obvious, however, when two variables are considered, one of which may 
be the direct result of the other. It must be kept in mind constantly that 
the coefficient of correlation shows relationship between two variables 
but gives no indication as to what factors caused the relationship. 

There are two types of coefficients of correlation in general use. The 
first was developed by Karl Pearson and is called the Pearson product- 
moment coefficient of correlation. The second was developed by Charles 
Spearman and is called the Spearman rank order coefficient. Whenever a 
coefficient of correlation is referred to without specifying the type, with- 
out exception it means the Pearson product-moment coefficient of correla- 
tion. When the Spearman rank order coefficient of correlation has been 
employed, it is often referred to as the rank order coefficient of correla- 
tion. The essential difference between the two types of correlations, 
from the standpoint of interpretation, lies in one single consideration. The 
product-moment type utilizes the values as they actually appear in the 
distributions, whereas the rank order type disregards the values and 
considers only the ranks. If the distance between the highest value and 
the next highest value is proportionally twice as great as the distance 
between the second and third value, the product-moment type takes that 
discrepancy into account, whereas the rank order type disregards it. In 
most cases, these types of coefficients of correlation yield similar values, 
but the former is to be preferred since it considers the raw data which 
the latter disregards, except as they affect the ranks in the distribution. 


DEFINITION METHOD OF COMPUTATION 


The Pearson product-moment coefficient of correlation is obtained by 
the solving of the formula 


— 7 
"7 Nowy 

where 

xy = coefficient of correlation 

Zon = sum of the products of the paired scores expressed in deviation form 

N = number of cases 

o. = standard deviation in one distribution 

gy = standard deviation in another distribution 
For illustrating the steps in the computation of a coefficient of correlation, 
scholastic aptitude test scores and general science test scores for 30 pupils 


$ 


COEFFICIENT OF CORRELATION 79 
TABLE 17. General Science and Aptitude Scores for 30 Pupils 


PA A A AAA A —————— 


SCORE SCORE 
SCHOLASTIC GENERAL SCHOLASTIC GENERAL 
PUPIL APTITUDE SCIENCE PUPIL APTITUDE SCIENCE 
x Y x 

1 130 20 16 178 35 
2 132 24 17 172 30 
3 152 28 18 165 28 
4 142 23 19 160 27 
5 184 37 20 148 25 
6 190 32 21 180 34 
7 150 25 22 149 25 
8 170 23 23 188 36 
9 181 29 24 167 29 
10 164 35 25 162 27 
11 175 32 26 145 23 
12 135 22 27 150 29 
13 147 24 28 160 30 
14 162 26 29 172 31 
15 136 21 30 154 30 


are shown in Table 17. The means and standard deviations of these dis- 
tributions are as follows: 


Aplitude General Science 
Test Scores Test Scores 
X = 160 Y = 28 
oz = 16.64 o, = 4.56 


For convenience, the aptitude test scores are called the X-scores and 
the general science test scores the Y-scores. These raw scores are then 
changed to deviation scores by subtracting the appropriate mean scores, 
160 or 28, from each raw X-score or Y-score as in Table 18. The resulting 
deviation scores appear in the columns headed x and y, and the products 
of each pair of deviation scores are shown in the next column under the 
heading xy. The terms needed for the formula are now available. The 
formula 

_ _2xYy 
To = Nowy 
becomes 
_ 1890 - 0.83 
Ta = (30) (16.64) (4.56) 

The foregoing method of computing a coefficient of correlation is easily 
understood but, in practical situations, is never used. The hypothetical 
scores shown in Table 17 were so chosen that the mean scores for each 
distribution are whole numbers. If these means should have been 159.78 
and 27.92, respectively, the amount of labor involved in the foregoing 


M 


80 STATISTICAL METHODS 


TABLE 18. Deviation Values of General Science and Aptitude Test Scores 


z y D (EN 
PUPIL x Y z H zy oz dy oz] \ oy 


1 130 20 —30 —8 240 —1.80 —1.75 3.15 

2 132 24 —28 —4 112 —1.68 —0.88 1.48 

3 152 28 —8 0 0 —0.48 0.00 0.00 

4 142 23 —18 —b 90 —1.08 —1.10 1.19 

5 184 37 24 9 216 1.44 1.98 2.85 

6 190 32 30 4 120 1.80 0.88 1.58 

7 150 25 —10 —3 30 —0.60 —0.66 0.40 

8 170 23 10 —5 —50 0.60 —1.10 —0.66 

9 181 29 21 1 21 1.26 0.22 0.28 
10 164 35 4 7 28 0.24 1.54 0.37 
11 175 32 15 4 60 0.90 0.88 0.79 
12 135 22 —25 —6 150 —1.50 —1.32 1.98 
13 147 24 —13 —4 52 —0.78 —0.88 0.69 
14 162 26 2 —2 —4 0.12 —0.44 —0.05 
15 136 21 —24 —7 168 —144 —1.54 2.22 
16 178 35 18 eg 126 1.08 1.54 1.66 
17 172 30 12 2 24 0.72 0.44 0.32 
18 165 28 ` 6 D 0 0.30 0.00 0.00 
19 160 27 0 =l 0 0.00 —0.22 0.00 
20 148 25 —12 —3 36 —0.72 —0.66 0.48 
21 180 34 20 6 120 1.20 1.32 1.58 
22 149 25 sl —3 33 —0.66 —0.66 0.44 
23 188 36 28 8 224 1.68 1.75 2.94 
24 167 29 7 1 Ti 0.42 0.22 0.09 
25 162 27 2 —I —2 0.12 —0.22 —0.03 
26 145 23 -15 =5 75 —0.90 LO 0.99 
27 150 29 —10 1 —10' —0.60 0.22 —0.13 
28 160 30 0 2 0 0.00 0.44 0.00 
29 172 31 12 3 36 7 0.72 0.66 0.48 
30 154 30 =ý 2 —12 —0.36 0.44 —0.16 


4 
D 
2 
E 
o 
o 


1890 24.93 


solution would have been increased enormously. Since most distributions 
yield mean values which are not whole numbers, the simplicity of this 
method of solution is offset by its impracticability. 

However, whenever both distributions are expressed in standard scores, 


this formula needs to be modified but slightly in order to yield a feasible 
method. 'The usual formula 


E Zu 
S No;0y 
may be rewritten 
z| (2\2 
MEN o2)\ Tu 


When the data in the last three columns of Table 18 are substituted into 
it, the formula becomes 


COEFFICIENT OF CORRELATION 81 


Thus, the coefficient of correlation between two distributions is the mean 
of the products of the paired standard scores. This method is used when 
values are already expressed in standard-score form. However, unless 
standard scores are needed for some other purpose, a coefficient of correla- 
tion is not obtained in this manner in practical situations. 


DEVIATION SCORE METHOD OF COMPUTATION 


With the definition method of computing the coefficient of correlation, 
each raw score has been converted into a deviation score. Also the stand- 
ard deviations of both distributions must be computed. In many cases, 
time would be saved if a method of computation were developed which 
would utilize only deviation scores, yet necessitating the calculation of 
no single deviation score as such. 

For this purpose, the definition formula can be revised so that only the 
crossproducts and the sums of squares of the deviation scores are necessary 
for the solution of the value of the coefficient of correlation. 


Since 
2 
BD Es qe si VE E 


Y; = 
W Noz0y 


then 
a A 
Ya = se Ei 
NNN VN 
Simplifying, 
Zu 


"a = VEAC 
This is perhaps the most widely used formula for determining the value 
of the correlation coefficient. In the case in which the raw data are 
ungrouped it is invariably applied. 
It has been previously demonstrated that 
Dy = DA = - and zy = ZY? — 2 


By follovving similar reasoning it can be shown that 


EXP) 


Say = EXY — N 


Consequently, to solve for an r-value by means of the foregoing formula, 
only five terms are necessary, that is, ZXY, EX, 2X2, ZY, and ZY”. 

For the purpose of illustrating the computation, the data in Table 17 
are again chosen. These data are transferred to the worksheet form as 


82 STATISTICAL METHODS 


shown in Table 19 in order to obtain SXY, 2X, ZY, EX”, and ZY”. The 
terms are as follows: 


N= 30 DX? = 776,304 
EX = 4,800 zy” — 24,144 
ZY = 840 ZXY = 136,290 


TABLE 19. General Science and Aptitude Test Scores 
(Sums of Squares and Crossproducts Work Sheet) 


PUPIL X y XY Ka y? 

1 130 20 2,600 16,900 400 
2 132 24 3,168 17,424 576 
3 152 28 4,256 23,104 784 
4 142 23 3,266 20,164 529 
5 184 37 6,808 33,856 1,369 
6 190 32 6,080 36,100 1,024 
7 150 25 3,750 22,500 625 
8 170 23 3,910 28,900 529 
9 181 29 5,249 32,761 841 
10 164 35 5,740 26,896 1,225 
11 175 32 5,600 30,625 1,024 
12 135 22 2,970 18,225 484 
13 147 24 3,528 21,609 576 
14 162 26 4,212 26,244 676 
15 136 21 2,856 18,496 441 
16 178 35 6,230 31,684 1,225 
17 172 30 5,160 29,584 900 
18 165 28 4,620 27,225 784 
19 160 27 4,320 25,600 729 
20 148 25 3,700 21,904 625 
21 180 34 6,120 32,400 1,156 
22 149 25 3,725 22,201 625 
23 188 36 6,768 35,344 1,296 
24 167 29 4,843 27,889 841 
25 162 27 4,374 26,244 729 
26 145 23 3,335 21,025 529 
27 150 29 4,350 22,500 841 
28 160 30 4,800 25,600 900 
29 172 31 5,332 29,584 961 
30 154 30 4,620 23,716 900 
Total 4,800 840 136,290 776,304 24,144 


When a calculator is used to obtain the necessary terms, the sums are 
ordinarily carried in the machine and only the totals recorded. Substitut- 
ing the foregoing values in the appropriate equations: 


2ay = 136,290 — & SCH 840) _ 1,890 


2 
3a? = 776,304 — an = 8,304 


COEFFICIENT OF CORRELATION 83 
Iy? = 24,144 — ar = 624 
The r-value can be determined by the substitution of these values in the 
deviation form equation 
Say 1,890 
go = = 0.83 
TT MER) (ZY) V (8,304) (624) 


The resulting value is the same as that obtained by the definition method 
formula. In many instances it will be more convenient to use the mathe- 
matically identical formula 


NEXY — ZXZY 


ty "DEE — GEINT — CI 


for machine computation than other formulas for Zen, When the formula 
is written in this manner, the numerator and each of the two bracketed 
terms of the denominator can be obtained in one operation on a cal- 
culator. 


CONSTANT ADJUSTMENT AND RAW SCORE 
COMPUTATION 


Occasionally all the values in one of the distributions involved in the 
computation of a coefficient of correlation will be large, thus resulting in 
rather unwieldly calculations and sums to be substituted into the for- 
mula, The size of the sums can be reduced considerably by subtracting a 
constant from each raw score in one or both distributions. The constant 
is selected arbitrarily, usually by choosing a multiple of ten equal to or 
smaller than the smallest raw-score value in the distribution. Further- 
more, the same constant need not be selected for each distribution. A 
constant which is a multiple of ten is suggested merely for convenience 
in computation. 

To illustrate the computation of the coefficient of correlation when 
constants have been subtracted from the raw scores, the worksheet in 
Table 20 has been prepared in which the constant 20 has been subtracted 
from each raw score in the Y-distribution and the constant 130 from 
each X-score. 

Since the deviation method formula can easily be expressed in terms of 


raw scores, i.e., 


syy ÇOHEN 
- 1 - = 
Ta = Va (Sy?) des — Tar z T “| 


“e 
84 STATISTICAL METHODS 
TABLE 20. Constant Adjustment of General Science and Aptitude Test Scores 


(X — 130) 
PUPIL xX Y X-130 Y—20 (X—130) (Y—20)? (Y —20) 
1 130 20 0 0 0 0 0 
2 132 24 2 4 4 16 8 
3 152 28 22 8 484 64 176 
4 142 23 12 3 144 9 36 
5 184 37 54 17 2,916 289 918 
6 190 32 60 12 3,600 144 720 
7 150 25 20 5 400 25 100 | 
8 170 23 40 3 1,600 9 120 | 
9 181 29 51 9 2,601 81 459 | 
10 164 35 34 15 1,156 225 510 
11 175 32 45 12 2,025 144 540 
12 135 22 5 2 25 4 10 
13 147 24 17 4 289 16 68 1 
1 162 26 32 6 1,024 36 192 
15 136 21 6 1 36 1 6 | 
16 178 35 48 15 2,304 225 720 
17 172 30 42 10 1,764 100 420 
18 165 28 35 8 1,225 64 280 
19 160 27 30 7 900 49 210 
20 148 25 18 5 324 25 90 
21 180 34 50 14 2,500 196 700 
22 149 25 19 5 361 25 95 
23 188 36 58 16 3,364 256 928 
24 167 29 37 9 1,369 81 333 
25 162 27 32 7 1,024 49 224 
26 145 23 15 3 225 9 45 
27 150 29 20 9 400 81 180 
28 160 30 30 10 900 100 300 
29 172 31 42 11 1,764 121 462 
30 154 30 24 10 576 100 240 
Total 900 240 35,304 2,544 9,090 hi 
the adjusted sum of squares and crossproducts in Table 20 can be sub- 


stituted directly: 
9,090°— (200)(240) 


fo 29 = 0.83 | 
ay Z = A D 
dan = EP Jesu- GT 


It will be noted that the size of the correlation coefficient obtained by 
adjusting is identical with that obtained by using the raw scores them- 
selves. h 

If the means of original values from which constants have been sub- 
tracted are wanted, they can be obtained from the adjusted values by 
adding the constant to the means of the adjusted values as follows 


COEFFICIENT OF CORRELATION 85 
Si = — ste = 
x = 2 — Ca 16, y - e, 


where C, and C, are the constants used in the X- and Y-distributions, 
respectively.' Substituting the data from the foregoing example into the 
formulas, they become 


30 


The standard deviations of the distribution remain unchanged, however.” 
Since this is true, it can readily be shown that the coefficient of correlation 
remains unchanged.3 It should be emphasized that the only advantage for 
adjusting the raw data by subtracting a constant is that of the time 
saved in computation, since the procedures are mathematically identical. 


Xx = XE + 130 = 160, Y = Ale on 2s 


1 Let X’ = Mean of coded values 


y 2-0) 
N 

X-Z o 

ZË YF'46 


2 Let zu a = deviation from coded mean 
Dgo = (X — C) — (X —C) 
Zo = X — pa 
Ke) SY 
De SE 
ce = an 


(ae) = Oz 


O 
3 By definition 


eN ae 
" Mesi) 
since 
ET = Elo 
and 
Zy = Zlu-o 
Zu = ELa—-oU 
also 
Ea? = Eo 
and 
zy = EEN 
pry ve Zeil el 
Try 


NGRE Verto) Zul 


86 STATISTICAL METHODS 


COMPUTATION OF THE COEFFICIENT OF 
CORRELATION FROM A TWO-WAY 
FREQUENCY DISTRIBUTION 


When a calculator is not available, the amount of labor required for 
computing a correlation of coefficient can sometimes be reduced by plot- 
ting a two-way distribution and using it as a basis for the computation. 
Furthermore, the plotting of a two-way frequency distribution yields 
first-hand information concerning the linearity or nonlinearity of a re- 
lationship between the variables. 


X-intervals 


Y-intervals 


To plot the required two-way frequency table, each distribution is 
grouped into convenient intervals. The values for each subject are then 
tabulated as is shown on the foregoing form in which a two-way fre- 
quency table has been built from the data in the foregoing example. For 
convenience, the scholastic aptitude scores are grouped into intervals of 
seven and plotted on the X-axis, and the general science scores are 
grouped into intervals of two and plotted on the Y-axis. Specially pre- 
pared correlation charts are available commercially, but a chart similar 
to the foregoing can be readily constructed from graph paper. 

When the data have been plotted and the tallies in each interval re- 
corded in the first row or column, an arbitrary origin is assumed and the 
deviation values recorded in the next row and column. If the origin is 
assumed to be near the center of the distribution, smaller sums will re- 
sult, but negative values will also be encountered. If the origin is assumed 
to be near or at the end of the distribution, larger sums will result but 


pë 


COEFFICIENT OF CORRELATION 87 


negative values will be eliminated or minimized. The deviation values are 
next multiplied by the frequencies in each interval and the result re- 
corded in the fd row and column. To obtain fd? the fd values are multi- 
plied by the corresponding deviation values. 

To obtain the dd, values, each tally (or individual) is multiplied by 
both its corresponding d+ and dy values. These products are then summed 
for the Y-interval and entered in the d.dy column. Thus, for the two 
tallies in the Y-interval of 36 and the X-interval of 182, each is multi- 
plied by 4 (d,) and by 4 (dz) and the products summed, (1 x 4 x 4) + 
(1x 4x4) = 32, and recorded in the appropriate cell in the d:d, column. 

The row and column values from the two-way distribution are then 
summed and substituted into the following formula: 


WR GALA 
7 pre Pa CS 


which for the foregoing example becomes 


p E 


30 


ta ge 
4 ES - SES - el 


Tt will be noted that there is a discrepancy between this value of 0.81 
and the value 0.83 which was obtained by the deviation score method. 
The 0,83 r-value is more accurate than the 0.81 r-value, although values 
obtained by computation of the correlation coefficient from a two-way 
frequency distribution are sufficiently accurate for most purposes. 


COMPUTATION OF THE COEFFICIENT OF 
CORRELATION FROM RANKS 


Although it is not essential that either or both distributions be normally 
distributed in order to compute and interpret a Pearson product-moment 
coefficient of correlation, it is necessary to ascertain that the values of 
any one variable are somewhat close together, otherwise a coefficient. of 
correlation may be found which is difficult to interpret. For example, 
if per pupil cost were to be correlated with number of pupils per school 
system in the state of Illinois, the Chicago schools would be so out of line 
that the correlation coefficient would be determined largely by one city. 
Whenever it appears more logical to disregard the size of the values in the 
distribution because of extreme nonnormality, a coefficient of correlation 
computed from ranks is more appropriate than one computed by the 
usual method. 

There are some instances in which the original data will be collected 
in the form of ranks. When this is true, the Spearman rank order method 


88 STATISTICAL METHODS 


of solution can obviously be applied. For example, supervisors in industry 
are sometimes asked to rank their foremen in order of effectiveness. 
Cadet-teacher supervisors are asked to rank the cadet teachers under 
their supervision according to some characteristic. These rankings then 
are used as a criterion against which to validate selection procedures, 
and so forth. 

The formula for the Spearman rank order coefficient of correlation is 


_ 6ED* 
AUS — 1) 


.p=1 


where 
2 e, 
p = coefficient of correlation (p = rho) 
D = difference between the ranks 
N = number of cases. 


TABLE 21. General Science Achievement and Aptitude Test Rankings 


for 80 Pupils 
RANK 
GENERAL SCHOLASTIC 
PUPIL SCIENCE APTITUDE D D? 

1 30 30 0.0 0.00 

2 23.5 29 —5.5 30.25 

3 15.5 19 3.5 12.25 

4 26 26 0.0 0.00 “i 

5 1 3 —2.0 4.00 

6 , 85 1 5.5 30.25 

7 21 20.5 0.5 0.25 

8 26 10 16.0 256.00 

9 13 4 9.0 81.00 
10 3.5 13 —9.5 90.25 
11 6.5 7 —0.5 0.25 
12 28 28 0.0 0.00 
13 23.5 24 —0.5 0.25 
14 19 14.5 4.5 20.25 
15 29 27 2.0 4.00 
16 3.5 6 —2.5 6.25 
17 10 8.5 15 2.25 
18 15.5 12 3.5 12.25 
19 17.5 16.5 1.0 1.00 
20 21 23 —2.0 4.00 
21 5 5 0.0 0.00 
22 21 22 —LO 1.00 
23 2 2 0.0 0.00 
24 13 11 2.0 4.00 
25 17.5 14.5 3.0 9.00 
26 26 25 1.0 1.00 
27 13 20.5 7.5 56.25 
28 10 16.5 6.5 42.25 
29 8 8.5 —0.5 0.25 
30 10 18 —8.0 64.00 


Total 732.50 


COEFFICIENT OF CORRELATION 89 


The procedure to be followed in the computation will be shown using the 
data in Table 17. The scores are changed to ranks as shown in Table 21. 
Whenever two or more individuals receive the same score in the original 
distributions, the ranks which these cases would occupy are averaged 
and the arithmetic mean of the ranks involved is assigned to each. For 
example, since pupil number 10 and pupil number 16 have a score of 35 
on 'the general science test and their ranks would be 3 and 4, the two 
ranks are averaged arithmetically and a rank of 3.5 is assigned to each 
pupil. Rank 5 is then assigned to pupil number 21 whose score is 34. 
Substituting in the formula ` : 


në GEAD ER 
DEE 

With these data, the value obtained differs but little from the value of 

the product-moment coefficient of correlation. 


Exercises 


1, Two instructors independently assigned final course marks to the same class 
n of 31 students in beginning educational psychology. The marking system used 
involved marks from 1 to 9. Nine was the highest mark. 


eg 


INSTRUCTOR 
STUDENT A B 
1 8 8 
2 4 3 
3 5 5 
4 6 5 
` E 4 4 
6 8 8 
7 8 9 
8 7 5 
9 7 6 
10 6 5 
11 5 5 
12 3 4 
13 3 4 
14 5 4 
15 4 3 
16 7 5 
17 7 6 
18 7 7 
19 9 7 
20 3 2 
21 2 2 
22 8 6 
23 5 5 
24 4 4 
25 6 5 


90 STATISTICAL METHODS 


ue 
EE O 


INSTRUCTOR 

STUDENT A B 
26 5 5 

27 1 1 

28 7 5 

29 6 5 

30 2 2 

31 8 7 


a. Compute the coefficient of correlation between the marks assigned by 
these two instructors. 
b. Prepare a two-way frequency table of these marks and note the linearity 
of the relationship and the degree to which the points scatter. 
c. Compute the coefficient of correlation from this two-way distribution. 
2. A statistics class was given a computational test and an interpretational 
test over the same units in a statistics course. The following raw scores resulted: 


STUDENT INTERPRETATION COMPUTATION 
1 33 28 
2 32 26 
3 31 29 
4 30 18 
5 30 25 
6 30 25 
7 29 23 
8 29 26 
9 28 25 

10 28 27 
11 27 16 
12 27 20 
13 27 23 
14 26 18 
15 26. 20 
16 26 23 
17 26 25 
18 25 21 
19 25 22 
20 24 18 
21 24 22 
22 24 26 
23 23 15 
24 23 20 
25 22 23 
26 21 16 
27 21 18 
28 21 19 
29 20 21 
30 19 16 
31 19 17 
32 19 22 
33 17 20 
34 15 15 


COEFFICIENT OF CORRELATION 91 


Compute the coefficient of correlation between these two sets of scores. 


DX SY 
N 


4. The following are intelligence quotients obtained from the administration 
of a paper-and-pencil test of intelligence and from a performance test of in- 
telligence. 


3. Show algebraically that Szy = [XY — 


gd 


STUDENT PERFORMANCE TEST PAPER-AND-PENCIL TEST 
1 83 80 
2 105 87 
3 97 98 
4 108 88 
5 112 106 
6 112 84 
7 94 84 
8 96 85 
9 115 88 

10 126 120 
11 104 84 
12 139 91 
13 + 92 93 
14 113 124 
15 74 87 
16 112 114 
17 107 62 
18 129 99 
19 85 76 
20 139 125 
21 124 121 
22 107 104 
23 106 E 
24 119 

25 111 104 
26 113 94 
27 75 78 
28 101 111 


E EEES a 


Compute the coefficient of correlation between these two sets of 1.Q.'s. 


6 


Classical Theory of 
Sampling 


Many of our everyday activities involve the process of drawing con- 
clusions about a whole based upon an examination of some part of this 
whole. For example, a housewife may buy a head of lettuce from examin- 
ing its outside leaves, a basket of apples from examining those in the 
top layer, a watermelon by examining a plug which has been cut from 
it. In fact, her decision as to whether to return to this particular grocer 
subsequently for more fruit and vegetables may depend upon her satis- 
faction with the products which she has bought to date. An employment 
manager may select employees on the basis of the behavior which they 
display during an interview. Conversely, an employee may decide whether 
to return to work after the first week or month on the basis of his work- 
ing conditions during this time. A psychologist may recommend a par- 
ticular type of therapy for a child by observing and testing the child in 
selected situations. A teacher may decide whether to use a particular 
teaching method by trying it with some of his pupils and then making a 
conclusion about its future effectiveness. All the foregoing examples con- 
tain two processes in common: (1) examining and comprehending the 
characteristics of a part, selected in some manner, and (2) making an 
inference about the whole from which this part was obtained. 

Although few, if any, of the foregoing examples involve evidence of a 
numerical nature, the two processes involved have a direct parallel in 
statistical methodology. Essentially on the basis of (1) summarizing 
numerical statements of evidence, and (2) inferring from a selected part, 
called a sample, to a whole, called a population or universe, statistical 
methodology may be classified into two major subdivisions, descriptive 
and sampling statistics. Descriptive statistics involves the computation 
of summary values, such as means, standard deviations, percentiles, and 
coefficients of correlation, for reporting characteristics in the group stud- 
ied. Thus, if the purpose of a research project is that of reporting findings 

92 


CLASSICAL THEORY OF SAMPLING 93 


about a group, with no attempt to generalize these findings, then descrip- 
tive statistics only will be used. Since the purpose of descriptive statistics 
is summarization, however, any inferences or conclusions drawn are open 
to serious question unless further statistical treatment is undertaken. 

Seldom is the research worker content to report only his findings about 
the group studied. A study takes on importance when suitable inferences 
can be drawn for larger groups than for that group included in an in- 
vestigation. Regardless of the ficld of inquiry, research studies, with few 
exceptions, are sampling investigations. 

For instance in agriculture, an investigation designed to test the effec- 
tiveness of two different fertilizers on corn yield is of little importance 
unless inferences can be drawn with greater scope than for the yield in 
the experiment. Although one fertilizer may be more satisfactory than an- 
other, as far as these experimental yields are concerned, it is unimportant 
unless generalization is possible to other crop yields under conditions 
similar to those prevailing in the experiment. 

In a similar way, an investigation concerning the effectiveness of large 
and small classes in any high school subject is unimportant unless in- 
ferences can be drawn which are applicable to other groups. Without 
such inferences, it is too late to rectify any disadvantage which accom- 
panies large or small class procedure. It js difficult indeed to conceive of 
many important research studies in education and psychology which are 
not sampling studies, i.e., research studies in which inferences are not 
needed which apply far beyond the group included in the investigation. 

The development and application of techniques for (1) selecting ap- 
propriate samples from a population; (2) analyzing sample data so that 
inferences can be made beyond the elements studied; and (3) determin- 
ing the amount of confidence to be placed in such inferences, have long 
been included in the subject matter of statistics. The earlier attempts at 
developing these techniques resulted in a sampling theory which has come 
to be referred to as the “classical theory of sampling.” Within the past 
thirty years, however, considerable refinement has been made in the 
techniques of sampling statistics. These later developments, called “mod- 
ern statistical inference,” date back to the pioneering work of R. A. 
Fisher, an English statistician, whose work appeared first in textbook 
form! in 1925. The first applications of Fisher’s work were made in the 
agricultural and biological sciences, and it is in those fields that modern 
statistical inference has developed. In the United States, Snedecor’s 
monumental work,? followed by four editions of his book on statistical 
methods, has done much to stimulate the application of appropriate 


"RA. Fisher, Statistical Methods for Research Workers (Edinburgh, Oliver and 
Boyd, Ltd., 1925). 
o George W. Snedecor, Calculation and Interpretation of Analysis of Variance and 
ovariance (Ames, Iowa, Collegiate Press, Inc., 1934). 


94 STATISTICAL METHODS 


methods to problems in agricultural and biological sciences. In spite of 
the rapid strides made in the foregoing fields, modern statistical inference 
has made much less impact on the fields of education and psychology. 
During the past few years, however, an ever-increasing number of studies 
utilizing the newer developments of statistical inference has appeared. 

Although the viewpoint is taken throughout this book that the meth- 
ods of modern statistical inference should be followed whenever conclu- 
sions are drawn, the indebtedness to the classical theory of sampling 
is recognized. 

The remainder of this chapter is devoted to a consideration of the 
classical theory of sampling. An appraisal of this theory of sampling 
provides much more than a historical record of suitable sampling tech- 
niques. It is probable that no adequate concept of modern statistical 
inference can be obtained without considerable understanding of the 
classical theory from which it originated. 


CHARACTERISTICS OF THE CLASSICAL THEORY 
OF SAMPLING 


The classical theory of sampling has been developed by considering the 


consequences of drawing random samplesi from a single homogeneous 
population of known statistical characteristics, such as the mean and the 
standard deviation, and noting the discrepancies from sample to sample 
away from these known characteristics. It may be pointed out that re- 
search situations in education and psychology in which the population 
mean and standard deviation are known, for all practical purposes, are 
nonexistent. The classical theory of sampling rests upon these known 
population characteristics or upon satisfactory estimates of such char- 
acteristics which can be obtained from a single sample. 

It has long been recognized by research workers that the assumption 
of a single homogeneous population is basic to the classical theory of 
sampling. To satisfy this assumption most studies have been limited by 
choosing samples in such a manner that partial homogeneity ensues. Any 
inferences drawn, obviously, must be limited to populations similar to 
those from which the group sampled might have been randomly selected. 
Thus, most studies dealing with achievement in arithmetic have been 
limited to one grade level and the interpretations made in terms of this 
grade level only. In some studies further delimitation has been made by 
including only one school, only one sex, or only one socio-economic level. 
To the degree that such delimitation has been made, the ensuing infer- 
ences have become more and more useful. On the other hand, the neces- 


1A random sample is defined as a sample so drawn that every member of the 
population has an equal chance of being chosen in the sample. Discussion of this 
and other types of samples is included in the following chapter. 


CLASSICAL THEORY OF SAMPLING 95 
TABLE 22. Estimates of Population Mean and Variance from 
Randomly Drawn Samples of 25 Cases 
SAMPLE MEAN VARIANCE SAMPLE MEAN VARIANCE 

1 67.88 349.03 51 79.26 548.66 
2 68.68 570.81 52 79.52 421.34 
3 70.68 631.89 53 79.72 537.79 
4 70.92 518.45 54 79.88 511.69 
5 71.00 697.58 55 79.92 448.24 
6 71.04 544.96 56 80.32 673.81 
7 71.56 215.17 57 80.44 396.76 
8 72.84 457.22 58 80.48 736.43 
9 73.04 487.37 59 80.52 442.43 
10 73.32 375.73 60 80.68 385.14 
11 73.96 616.62 61 80.72 598.54 
12 74.24 433.77 62 80.76 289.02 
13 74.28 576.88 63 81.08 302.08 
14 74.40 566.00 64 81.36 603.57 
15 74.44 503.76 65 81.40 480.42 
16 74.68 453.98 66 81.88 598.94 
17 74.68 253.31 67 82.08 472.16 
18 74.84 551.97 68 82.08 714.74 
19 74.88 467.94 69 82.16 406.56 
20 75.00 659.92 70 82.36 523.24 
21 75.16 403.39 71 82.48 366.76 
22 75.68 503.64 72 82.72 434.54 
23 75.72 722.13 73 82.80 555.08 
24 75.76 661.27 74 82.80 564.58 
25 75.88 381.61 75 82.96 678.71 
26 75.92 588.58 76 83.04 429.33 
27 76.36 332.07 77 83.08 370.70 
28 76.84 677.14 78 83.16 725.11 
29 76.88 492.11 79 83.24 534.44 
30 77.12 228.61 80 83.32 578.48 
31 77.18 505.54 81 83.36 543.69 
32 77.20 395.17 82 83.40 474.43 
33 77.68 528.48 83 83.44 632.48 
34 77.92 496.33 84 83.64 748.99 
35 78.24 601.52 85 83.76 465.11 
36 78.24 256.77 86 84.28 497.24 
37 78.44 487.59 87 84.32 572.76 
38 78.52 458.80 88 84.84 607.71 
39 78.52 382.93 89 85.16 436.94 
40 78.60 556.42 90 86.00 480.66 
4l 78.64 531.74 91 86.08 554.25 
42 78.72 540.54 92 87.00 502.27 
43 78.72 436.04 93 87.04 495.81 
44 78.66 413.52 94 87.48 572.24 
45 78.84 478.72 95 87.84 534.47 
ás 78.88 428.03 96 88.24 601.34 
ES 78.88 575.78 97 88.68 600.37 
Sch 78.96 500.37 98 89.04 528.48 
E 79.12 326.86 99 90.20 511.69 
79.16 471.89 100 92.44 540.54 


96 STATISTICAL METHODS 


sity for such delimitation, in many cases, cannot be justified either from 
«theoretical considerations or from fragmentary statistical analysis. 

The classical theory of sampling can be illustrated by an example in 
which random samples of test scores have been drawn from a population 
of geography test scores. The population of test scores consisted of more 
than 9,000 geography test scores for fifth-grade pupils who participated 
in a state-wide testing program. These test scores had a mean of 79.30 
and a standard deviation of 24.21. By drawing a random sample from 
this population it is possible to estimate the population mean and stand- 
ard deviation from the sample characteristics. Any inferences drawn, 
however, must be limited to fifth-grade pupils. It is also assumed in the 
classical theory of sampling that no further subdivision of the population 
on some basis such as sex, or school, is necessary; i.e., a single homo- 
geneous distribution of geography achievement is present. 

From the population of fifth-grade geography test scores 100 samples 
of 25 scores each were drawn by the use of a table of random numbers? 
and the mean and variance for each of these samples were computed 
and are shown in Table 22. After the characteristics of each sample were 
noted, the sample was replaced in the population. For convenience, these 
samples have been arranged according to the size of the estimates of 
the mean obtained in successive samples. An inspection of these estimates, 
or sample means, which vary from 67.88 to 92.44, indicates clearly the 
impossibility of identifying the population mean from any single sample 
estimate. The population mean is not identical with that noted in any 
one of the 100 sample estimates, 51 of which are smaller and 49 of which 
are greater than the population mean. 

The classical theory of sampling has been based upon the tendency of 
an infinite number of randomly drawn samples to yield a sampling dis- 
tribution of estimates of the population mean which tend to follow the 
normal curve distribution. The normal curve is centered over the popula- 
tion mean and the standard deviation of the sample estimates is found 
by dividing the population standard deviation by the square root of the 
number of cases used in each estimate. 

The standard deviation of the sample estimates is called the standard 
error of the mean and may be obtained from the formula 


= E 
e YN 
where oz is the standard error of the mean, o is the standard deviation 
in the population, and N is the number of cases in the sample. For the 
present example of 25 cases in a sample and a population standard devia- 
tion of 24.21. 


1A method of using a table of random numbers is described in the following 
chapter. 


CLASSICAL THEORY OF SAMPLING 97 


Thus, it could be expected that approximately 68 out of every 100 samples 
of 25 drawn at random from this population will yield estimates which 
differ from the population mean of 79.30 by no more than 4.842. Actually 
70 of the 100 estimates shown in Table 22 lie within the one standard 
error distance from the population mean. The degree to which the actual 
distribution of estimates fits the theoretical normal curve may be seen 
from the following: 


n 


2 NO. OF 
SH POPULATION LIMITS SAMPLE NUMBER _ SAMPLE MEANS THEORETICAL 
79.30 LOWER UPPER WITHIN WITHOUT WITHOUT 
0.5 76.88 81.72 28 65 37 63 62 
1.0 74.46 84.14 15 85 70 30 32 
1.5 72.04 86.56 7 91 84 16 14 
2.0 69.62 88.98 2 97 95 5 5 
2.5 67.19 91.41 0 99 99 1 1 


The classical theory of sampling, when applied to a sample not too small 
in size, is theoretically accurate for the sampling distribution of means 
Whenever (1) the population variates constitute a single homogeneous 
Population; (2) the population variates are normally distributed; (3) the 
sample has been chosen at random; and (4) the mean and standard devia- 
tion of the population are known. The first two of the foregoing limita- 
tions are not too serious whenever the ultimate purpose is to obtain an 
estimate of the population mean from a sample. The requirement of 
random selection is a prerequisite for interpretations from the classical 
sampling theory as well as from modern statistical inference. The require- 
ment that the mean and standard deviation of the population be known 
as a condition which, for all practical purposes, in studies in education 
and psychology is nonexistent. 

The classical theory of sampling has been developed with the assump- 
tion that the estimates of the mean and standard deviation in a popu- 
lation obtained from a sample may be substituted for the needed 
Population values. An inspection of Table 22 reveals that the 100 estimates 
of the Population mean of 79.30, when obtained from samples of 25, vary 
from 67.88 to 92.44. Estimates of the population variance likewise vary 
among these samples from 215.17 in sample No. 7 to 748.99 in sample 
No. 84 with the resulting standard deviation limits of 14.67 and 27.37. 
The foregoing estimates of the variance in the population have been 
obtained, in some deference to modern statistical inference, by using 24 
rather than 25, N — 1 rather than N, as the denominator reflecting sample 


98 STATISTICAL METHODS 


size. The standard error of the mean obtained from the data in Table 22 
varies from 2.934 to 5.474. 

The most frequent interpretations of a sample mean and its standard 
error have been made from probabilities obtained from a normal curve 
centered over the sample mean rather than the correct but unknown 
population mean. From logical considerations, this shift in centering the 
normal curve is indefensible. From practical considerations, too little 
evidence is available to indicate the ensuing damage to appropriate inter- 
pretations. In any case, the methods indicated in the following chapters 
are more appropriate than those of classical sampling theory. 

Standard errors can be obtained for statistical measures other than 
the mean, but for each a different formula is required. The standard 
error of a median can be found from the formula 


1.25 ¢ 
Cmed. = VN 


and the standard error of a proportion from the formula 


Ee en 
proportion N 


where p is the proportion concerned and qis 1 — p. This formula is satis- 
factory unless either p or q is less than 0.05 or 0.10. 
The standard error of a standard deviation, found from the formula 


a 


TT Van 


is none too satisfactory since the sampling distribution of the standard 
deviation does not follow the normal curve. 

The standard error of a coefficient of correlation is found from the 
formula 


_Ll=* 


oN 


where r is the coefficient of correlation in the population. Since the 
sampling distributions of coefficients of correlation differ radically from 
a normal distribution except when the population coefficient is zero, the 
use of this formula is extremely limited. The formula is shown here for its 
historical importance. More adequate techniques for the treatment of the 
coefficient of correlation in samples will be described in a later chapter. 

Within one standard error distance above and below the population 
mean, 68 out of 100 means obtained from random samples taken from 
the entire population will theoretically be included. It has been the 
practice in an ever-decreasing number of studies to report the probable 
error rather than the standard error of the mean. The two measures differ 


CLASSICAL THEORY OF SAMPLING 99 


only in the size of the unit. The probable error may be obtained by 
multiplying the standard error by 0.6745. When laid off on both sides 
of the population mean, the probable error distance includes 50 per cent 
of the means from successive random samples. 

The degree of confidence which can be placed in a sample mean as an 
estimate of the population mean, or of any other sample statistical 
measure as an estimate of its population value, is of much less usefulness 
in education and psychology than in many other fields of inquiry. Most 
Fesearch studies involve the comparison of two sample means, or two other 
statistical measures. 


THE DIFFERENCE BETWEEN TWO SAMPLE MEANS 


Whenever two sample means are compared, the question arises, “Is 
the difference noted (1) so small that it might have resulted from indi- 
vidual differences among cases drawn for the two samples or (2) so large 
that it is unreasonable to expect the discrepancy to have so resulted?” The 
answer to this issue is expressed in terms of probability and has been 
designated as a test of significance. i d 

'The steps in making the test of significance, according to the classical 
theory of sampling, are, (1) note the difference in means: (2) compute 
the standard error of the mean for each sample: (3) compute the standard 
error of the difference between the two means from the formula, 


das. = Y on F sh 
(4) obtain the ratio of the difference in two means to the standard error 
of the difference: and (5) note the area under the normal curve for the 
sigma distance of this ratio. This area constitutes the probability that 
the difference is no greater than random sampling would suggest. 

The prevailing practice, until about twenty years ago, was to demand a 
ratio, called the “critical ratio,” of the difference to its standard error 
of 3.00 or more for significance; i.e., to render untenable the hypothesis 
that it was possible that no difference between the means existed in the 
Populations from which the samples were taken. It should be obvious 
that the requirement for a ratio of 3.00, or any other ratio, is highly 
arbitrary. During the past twenty years research workers have modified 
this requirement downward and have tended to express their findings in 
terms of a probability rather than in terms of a ratio. The probabilities 


“This formula results from the general formula 


=.~f ne = 
Fan. oe aj 270,7, 


whenever r = 0, i.e. whenever the samples are uncontrolled and independent of 
each other. In situations in which the individuals have been paired on some char- 
acteristic the correlation term cannot be ignored. It should be pointed out that i 
Se always positive, since it represents the degree to which values can be predicted, 
which is always positive. 


100 STATISTICAL METHODS 


most often quoted have been the 5 per cent level, designated as significant, 
and the 1 per cent level, designated as highly significant. The 5 per cent 
level implies that, if identical means exist, the probability of obtaining 
from random sampling two means as different or more different than the 
ones actually found is five in one hundred. It bears repeating that any 
ratio or probability chosen as significant is highly arbitrary. 

Research workers using the classical theory of sampling, as well as those 
using modern statistical inference, agree that significance as here defined 
should not be confused with importance. Large samples may yield dif- 
ferences between two means which are significantly different, that is, 
which could not reasonably be attributed to fluctuations of random sam- 
pling, and yet the difference might be of little or no practical importance. 
On the other hand, in small samples, differences in means which are 
nonsignificant might be of considerable importance if these differences 
exist in the populations concerning which conclusions are to be drawn. 

The classical theory of sampling for many years has been used for 
testing hypotheses in sampling surveys. It is in this respect, perhaps, 
that research workers in the social sciences frequently have drawn 
erroneous conclusions. Among the assumptions necessary for appropriate 
use of the classical theory of sampling, the one requiring a sample drawn 
from a single homogeneous population in most situations is entirely 
untenable. 

A hypothetical example, much less complicated than usually found in 
sampling surveys, may indicate the unfortunate pitfalls into which many 
careless research workers have stumbled. A survey involving both boys 
and girls in the fourth and eighth grades included a score on a scholastic 
aptitude test for each individual. Both boys and girls had a mean score 
of 36 in the fourth grade and 72 in the eighth grade. There were 20 boys 
and 80 girls in the fourth grade and 80 boys and 20 girls in the eighth 
grade. The mean score, then, for all boys was 64.8 and for all girls 43.2, 
indicating that boys are superior in scholastic aptitude. The fallacy in 
this conclusion, obviously, lies in the disproportionate number of boys 
and girls in the fourth and eighth grades. It is apparent that the con- 
clusion would be reversed had the number of boys and girls in the fourth 
and eighth grades been reversed. 

Whenever the sample is not further stratified than it has been in the 
foregoing example, modern statistical inference will provide a satisfactory 
test of significance. Unfortunately, most sampling surveys are more com- 
plicated than the hypothetical one here described. Surveys dealing with 
human beings are especially difficult to evaluate. There are so many 
characteristics, often highly interrelated, occurring in disproportionate 
number of cases in subgroups, that the use of classical sampling theory 

"` for testing various hypotheses in any given sample is almost impossible. 
The practice of testing first one hypothesis and then another without 


Si 


CLASSICAL THEORY OF SAMPLING 101 


regard to interrelationship is indefensible. If not too many strata exist 
in a sample, modern statistical inference, later to be discussed, may yield 
satisfactory approximations. Appropriate statistical analyses of sampling 
surveys are complicated and should be undertaken only by the research 
worker sophisticated in statistical methodology. 


IMPORTANCE OF THE CLASSICAL THEORY 
OF SAMPLING 


The classical theory of sampling is slowly fading from the research 
scene. This change is not due to lack of soundness in its underlying theory 
but rather to the failure of research situations to parallel the basic 
assumptions needed for its uses. In actual research problems, the popula- 
tion mean and standard deviation are unknown. Estimates of these values 
from a sample are slightly biased when interpreted in terms of the normal 
curve, although without bias when interpreted in terms of a t-distribution 
as suggested by an English statistician writing under the pen name of 
“Student” almost fifty years ago. 

It remained, however, for Fisher to develop suitable statistical meth- 
odology for research studies involving a sample from a population com- 
prised of definite subdivisions or strata. The Fisher impetus has so greatly 
modified statistical methodology that the traditional classical theory of 
sampling is no longer acceptable, since so few studies involve samples 
that may be considered as a random sample from a single homogeneous 
population, Thus the demand arises for the design of studies which will 
permit available statistical models to be employed. The problems involved 
in designing research studies and selecting appropriate models represent 
the major emphasis throughout this book. No attempt is made to explore 
the variety of designs which have been found useful in other fields of 
research, nor has any attempt been made to utilize all the various models 
which have been proposed by mathematical statisticians. The statistical 
methodology which follows is directed toward the types of situations most 
frequently encountered in educational and psychological research without 
Tegard to the principles of designs and statistical models found useful 


M other research areas. 


Exercises 


1. Differentiate between a sampling distribution and the distribution of a 
Sample. 

2. Is a standard error necessarily a standard deviation? Explain. ` ` 

3. If the size of the samples used to develop Table 22 had been 250 instead 
SE would you expect the range of the sample means to increase or decrease? 

y? 


4. Using the 100 variances listed in Table 22, construct a 10-interval fre- 

quency distribution. On the basis of this frequency distribution, plot the ap- 
vu 

Student,” “The Probable Error of a Mean,” Biometrika, Vol. VI (1908), pp. 1-25. 


102 STATISTICAL METHODS 


proximate shape of the sampling distribution of the variances. Would you 
suggest that the shape of this sampling distribution differs from that described 
for sample means? Explain. 

5. Compute the standard deviations of the 100 samples listed in Table 22 and 
construct a 10-interval frequency distribution. Plot the approximate shape of 
the sampling distribution of standard deviations. 

a. Describe how the shape of this curve differs from that described for 
sample means. 

b. How does it differ, if at all, from the shape of the sampling distribution 
of variances found in Exercise 4? 

6. What per cent of the 100 sample variances in Table 22 overestimate the 
population variance? Suggest reasons why this per cent is or is not as large as 
you would expect. 

7. Find the per cent of the 100 sample standard deviations derived from 
Table 22 that are overestimates of the population standard deviation. 

a. Does this per cent differ from that which you expected? 
b. Does this per cent differ greatly from the per cent of the sample vari- 
ances overestimating the population variance, as found in Exercise 6? 

8. Find the means of the sample means, the sample standard deviations and 
the sample variances of the 100 samples included in Table 22. Do the three 
values differ greatly from the three population values respectively ? 

9. Using the values reported for the 9000 geography test scores reported in 
the beginning of the chapter, compute the standard error of the standard devia- 
tion. 

a. Interpret this value as to its function. 

b. Show whether the 100 sample standard deviations derived from Table 
22 do or do not conform to your interpretation of the function of the 
standard error of the standard deviation. 

10. Define a test of significance. 


o E 


e 


3 


1 


Statistical Inference— 


Estimation 


The classical theory of sampling discussed in the preceding chapter has 
slowly given away its place in educational and psychological research 
to the developments in sampling theory known as statistical inference. 
Statistical inference, as the term implies, constitutes the drawing of gen- 
eralizations from numerical information available from sample groups 
to a larger group, from which information has not been obtained. The 
former group is known as a sample and the latter as a population or 
universe. Any research study demands that it be so designed that logical 
reason will not be violated in making the transition from sample findings 
to population generalizations. 

In making the foregoing transition, two distinet purposes can be im- 
mediately recognized. First, the research study may demand that an 
estimate of some population deseriptive statistical measure such as the 
mean, coefficient of correlation, standard deviation, or proportion be 
obtained from a sample. This area of statistical inference is called estima- 
tion, Second, the research study may demand the testing of some hypoth- 
esis such as the effectiveness of instruction in classes of different size. 
This distinction between problems of estimation and those of hypothesis 
testing should be maintained, since the appropriate sampling designs for 
these two areas of statistical inference differ widely, except in the unusual 
case in which the sampling has been made from a single homogeneous 
Population. For estimation, the sample should be so drawn that it is 
Tepresentative of the population concerning which inferences are to be 
Made, subgroup cases being drawn for the sample in the same proportions 
as they exist in the population. For testing hypotheses, on the other hand, 
the sample should be so drawn that the cases in the subgroups are equal. 
_ An example of the difference between estimation and hypothesis testing 
in the type of sample to be drawn may be found in the salaries of class- 
room teachers in a state, The teachers may be classified as male and 

103 


104 STATISTICAL METHODS 


female and as elementary and secondary. Thus the population is not a 
single homogenous population, but rather it is a population made up of 
four subgroups of teachers, male elementary, female elementary, male 
secondary, and female secondary teachers. 

If the problem being investigated is to estimate from a sample the 
mean salary of classroom teachers in a state, the statistical inference 
employed is estimation. For such purpose, the sample should be so chosen 
that the number of cases in each of the four cells is proportional to the 
number occurring in the population. On the other hand, if the purpose is 
to test the hypothesis that salary is independent of sex or school level, 
the number of cases in each of the four cells should be equal, regardless 
of the population distribution. 

A single sample which will be appropriate for both purposes, i.e. estima- 
tion and hypothesis testing, would occur whenever the number of meas- 
urements in each of the subgroups of a population actually existed in 
equal frequencies. The probability of such an occurrence is too remote 
to receive consideration in situations dealing with characteristics found in 
educational and psychological research. The design of a research study 
must be determined by its purpose, i.e., estimating population charac- 
teristics or testing hypotheses. 

It is possible, in certain situations, to make approximations of estimates 
from a sample appropriate for testing a hypothesis, and vice versa, but 
such statistical treatment is not only inefficient but becomes so complex 
that it should be undertaken only in case an appropriate design cannot 
be had. If the research worker must resort to such approximations, he 
must be equipped with statistical competence far beyond that which can 
be obtained from introductory textbooks such as this one, as well as with 
educational and psychological theory pertinent to the particular research 
situation. 


VALIDITY AND REPRESENTATIVENESS 
OF SAMPLE DATA 


Regardless of whether the purpose of a study involves estimation or 
hypothesis testing, there are two characteristics demanded which apply 
equally well for each of these areas of statistical inference. The research 
worker is constantly confronted with two problems regarding numerical 
data. These problems are, (1) Do the numerical data obtained portray 
the characteristic about which an inference is desired, and (2) Is the 
group studied like the larger group for which an inference is desired? 
In many respects these two problems must be answered by the good 
common sense of an investigator, well versed in the field in which the 
study is made. A capable statistician will point out the necessity for 
someone in education or psychology to assume the major responsibility 


STATISTICAL INFERENCE—ESTIMATION 105 


for these two considerations. The former will be called validity of the 
data and the latter will be called representativeness of the data. 

The degree to which the validity of the data is an important considera- 
tion varies widely from one research field to another, as well as from one 
research study to another within any given field. For example, some of the 
most satisfactory numerical measurements are of the length and weight 
of physical objects. For most purposes these measurements are relatively 
constant with respect to time and place. 

In contrast to these direct measurements are measurements of human 
characteristics such as abilities, interests, and attitudes. The extent to 
which measurements of such human characteristics are subject to change 
is indicated by the examples in the following paragraphs. 

In public-opinion polling such as occurs during presidential election 
years, it is well known that preferences indicated in September or October 
may not still prevail at the time of the November election. Aside from the 
representativeness of the sample there are issues involved in the validity 
of the data such as (1) the wording of the statement to which respondents 
are to react and (2) the impartiality of the enumerator. Pollsters have 
done well in providing the necessary safeguards for these two considera- 
tions, but they have no control over the right of an individual to change 
his mind between the time of a survey and election day. Thus inferences 
are usually accompanied by some qualifying statement such as Wi the 
election were held today” or “providing there are no changes in senti- 
ment.” 

The validity of the data becomes an extremely important considera- 
tion in most research in education and psychology. For example, are test 
Scores, or the resulting 1.Q.s, valid data with respect to intelligence? 
Are achievement test scores valid data with respect to achievement? 
Generalizations drawn in studies involving achievement are subject to the 
limitation, either stated or implied, “To the degree that the data are valid 
Indications of achievement.” 

_ The representativeness of the data, or the degree to which the sample 
ìs like the population for which inferences are to be drawn, involves two 
Considerations. First, the sample may differ from the population because 
the sample has been so drawn that it is biased. In the foregoing example 
of public polling it is conceivable that a sample of respondents may be 
chosen such that the sample is quite unlike the population of voters about 
whom generalizations are to be made. Such an instance might occur if 
the sample were chosen entirely from telephone owners. Obviously such 
a sample would contain constant bias no matter how carefully the names 
from the telephone directories were chosen. Since the bias is introduced 
in the process of inferring from the telephone directory sample to the 
Population of voters, there is no way to evaluate this bias mathematically, 


106 STATISTICAL METHODS 


Second, the sample from the population may differ because of fluctuations 
which occur whenever a sample has been taken wholly at random either 
in a single population or in the subgroups of a stratified population. The 
fluctuation arising from the sampling process constitutes the framework 
of statistical inference. Provided the condition of random selection is 
fulfilled, this fluctuation, called sampling error, can readily be taken into 
account. Bias in the method of drawing the sample, however, involves 
differences whose magnitudes are unknown and not subject to evaluation 
by mathematics. 


ESTIMATES AND PARAMETERS 


Descriptive statistical measures such as the mean, the median, per- 
centiles, and the standard deviation have been discussed in previous 
chapters. In many situations these values must be estimated or inferred 
from a sample chosen from a larger group for which information is 
desired. The importance of this type of statistical treatment varies widely 
depending on the applied area. For example, in marketing research and 
in public-opinion polling, the problems consist of inferring, from a sample, 
the characteristics of a larger group. Although situations in educational 
and psychological research are limited in which the research problem 
involves only estimation, a consumer acquaintance with the process of 
estimation is nevertheless useful to the research worker. 

A descriptive statistical measure, such as the mean, obtained from a 
sample is referred to as an estimate or a statistic. The actual value of 
this statistical measure in the population is referred to as a parameter. 
It is obvious that a parameter is unknown. The problem of estimation 
is concerned with the confidence which can be placed in a sample 
estimate with regard to an unknown population parameter. An analysis 
of the kinds of populations and the kinds of samples, perhaps is the most 
suitable approach to that area of statistical inference known as es- 
timation. 


TYPES OF POPULATIONS 


Populations may be classified with respect to (1) size, (2) homogeneity, 
and (3) existence. Thus, populations are finite or infinite, homogeneous or 
consisting of subpopulations, and actual or hypothetical. Theoretically at 
least, the required statistical analysis needed in estimation depends upon 
the foregoing population characteristics. The size of the population about 
which inferences are to be drawn from a sample may be considered by 
appropriate techniques, later to be shown. Although sample size may be 
increased so that it represents successively 1 in 100,000, 1 in 10,000, 1 in 
1,000, 1 in 100, 1 in 10, and so on, these successive greater proportions of 
the population included in the sample are more or less unimportant in 
estimation until the ratio of sample to population is as great as 1 to 10. 


STATISTICAL INFERENCE—ESTIMATION 107 
The last-shown ratio is highly arbitrary, the degree of difference being 


more appropriately shown by, h — , where N = sample size and 


pop. 

Npop. = population size. An evaluation of this expression reveals that 
statistical analysis based upon an infinite population differs but little 
from that based upon a finite population unless the proportion of the 
population included in the sample becomes extremely large. For purposes 
of an introductory discussion of estimating a population parameter from 
a sample, it is possible to limit the discussion to samples drawn from an 
infinite population. 

A single homogeneous population not subject to classification on some 
basis is almost inconceivable. Human populations include both sexes, 
various socio-economic levels, various types of home environment, and 
othe: considerations too numerous to enumerate. How many of the 
bases for classification should be considered in a statistical analysis 
depends upon the characteristic being evaluated. Satisfactory statistical 
analysis is contingent upon the necessity of stratification, i.e., taking 
into consideration those classifications essentially related to the character- 
istic being evaluated. The occupation of father as reported by twenty- 
year-old individuals may be classified by sex of the respondent. No strat- 
ification on the basis of sex of the respondent is suggested unless someone 
supports the idea that the occupation of a father is related to the sex of his 
twenty-year-old offspring. In spite of other sex differences known to 
exist, no satisfactory reason can be advanced for classifying the popula- 
tion of occupations of fathers on the basis of sex of the respondent. 

The necessity for stratification on the basis of the various possible 
classifications may be evaluated by good common sense or by an intimate 
acquaintance with the theory and research in any given field or research 
endeavor. It should be apparent that any population parameter suggested 
by a sample estimate should not ignore the possibility that the population 
Consists of subpopulations which should be considered in drawing infer- 
ences about population characteristics from sample estimates. 

Some populations actually exist whereas others are entirely hypo- 
thetical. The sixth-grade pupils in the public schools in this country 
represent a population which actually exists. On the other hand, the 
Population resulting from a study of some new variety of seed may be 
hypothetical if all the developed variety have been used in a sample 
study. In the area of statistical inference known as estimation, an actu- 
ally existing population predominates rather than a hypothetical popula- 
tion which more frequently predominates in the testing of hypotheses, 
later to be discussed. 

In summarizing the type of populations to be considered in problems of 
estimation arising in educational and psychological research, certain 


108 STATISTICAL METHODS 


population characteristics must be considered and others may very 
well be ignored. The size of the population is relatively unimportant in 
research in education and psychology, since seldom does the sample size 
constitute an appreciable proportion of the population size. From the 
standpoint of a population being hypothetical or actually existing, the 
statistical treatment does not differ. On the other hand, the research 
worker is constantly faced with the decision concerning whether the popu- 
lation is a single homogeneous population or whether it is a composite of 
several populations about which estimates are desired. 


METHODS OF CHOOSING A SAMPLE 


The problem of choosing a sample representative of a population is not 
difficult in theory, but in many studies the method must be modified in 
order to bring the cost of the research into line with available funds. Such 
modifications introduce many varieties of sampling methodology. Regard- 
less of the method, the principle of randomness must be satisfied. The 
most straightforward method of obtaining a representative sample from 
a population consists of choosing the individual cases by unrestricted 
random sampling with the use of a table of random numbers. An unre- 
stricted random sample is defined as a sample so drawn that each member 
of the population has an equal chance of being chosen in the sample. 
This type of sampling method leaves little to be desired whenever (1) 
lack of prior knowledge of population characteristics precludes stratifica- 
tion and (2) costs are not prohibitive. 

Another method of choosing a sample, known as stratified sampling, 
consists of drawing a random sample within the subgroups of a popula- 
tion which may be subdivided according to certain classifications in which 
the proportionate number of cases in each subgroup are known. Obvi- 
ously, if the purpose of the analysis is estimation, the number of cases 
in each subgroup in the sample should be proportional to that existing in 
the population. In some studies this method has been modified by choos- 
ing the number of cases in each subgroup proportional to the product of 
the number of cases and the standard deviation of the characteristic being 
studied in the population subgroup. This modified method of the stratified 
sampling presupposes a reasonably satisfactory knowledge of the popu- 
lation characteristics included in the stratification. 

The foregoing methods of sampling have used the individual as the 
sampling unit. In some studies, from convenience or economy, a group 
of individuals is used as a sampling unit. This method of sampling is 
referred to as cluster sampling. Thus a survey in a city may use a block 
as the sampling unit. The block to be included is chosen at random, and 
a complete enumeration is made within each block so chosen. 

Another type of sample consists of arbitrarily selecting certain segments 
as typical of a population and confining the sample to that segment. 


STATISTICAL INFERENCE—ESTIMATION 109 


An example of this type of sample is provided in which a state-wide 
estimate is desired and the sample is taken from a few so-called typical 
counties. This type of sampling reduces the cost of obtaining estimates, 
but it does also place a grave responsibility upon the research worker 
concerning prior knowledge of the assumption of representativeness of 
the sample as an indication of the population parameter. The seriousness 
of this assumption varies depending upon the problem under inquiry. 
Cook County would not be considered by anyone as typical of Illinois if 
an estimate of occupational distribution is desired. On the other hand, 
the ratio of male to female births in Cook County may be quite repre- 
sentative of the entire state of Illinois. 

No attempt will be made in this book to describe the various sampling 
methods which should be employed in the many situations which might 
be postulated in which estimates of population parameters might be 
inferred from a sample estimate. The infrequency with which such prob- 
lems are encountered by the research worker in education and psychology 
suggests a limited discussion of the entire field of estimation. The research 
worker in public-opinion polling or marketing research will find useful 
the work of Yates! and Cochran and Cox? as well as the numerous articles 
appearing in the professional periodicals dealing with that phase of 
statistical inference which is concerned with sample estimates of popula- 
tion parameters. The discussion of estimation will here be limited to 
unrestricted random sampling in a single homogeneous population or in a 
Stratified population. ` 

The Table of Random Numbers. The table of random numbers is an 
easily applied device for selecting a random sample from a population 
or subpopulation, each member of which can be identified and num- 
bered. This table, shown in the Appendix, consists of a group of randomly 
arranged digits. F 

To select a random sample from a population or subpopulation by 
means of the table of random numbers, it is first necessary to number 
the cases to be sampled. Then a pencil point is dropped anywhere on the 
table of random numbers and the pair of two-digit numbers of size 49 
or less which are nearest the pencil point are noted. These numbers are 
used to identify the row and column location of the starting point in 
the table of numbers, If the cases being sampled are fewer than 10, the 
Tandom number found at the starting point identifies the first member 
of the sample, If the cases being sampled are fewer than 100, the random 
number at the starting point and the number immediately to the 
Nght of it identify the first sample member. If the cases being sampled 


* Frank Yates, Sampling Methods for Censuses and Surveys (London, Charles 


Griffin & Sons Limi 
s Limited, 1949). 2 pë . 
"Wo Cochran SCH G. Ke Cox, Experimental Designs (New York, John Wiley 


& Sons, Inc., 1950). 


110 STATISTICAL METHODS 


number less than 1,000, then the random number at the starting point and 
the two numbers immediately to the right of it identify the first sample 
member. If the cases being sampled are even more numerous, the fore- 
going procedure is extended in a logical manner. 

By moving from the starting point downward, upward, sideways, diag- 
onally, and any combination of these directions, the second member, 
third member, fourth member, and so on, is found until a sample of the 
desired size is obtained. If the last entry in a row or column is reached 
before the sample is completely drawn, it is recommended that the new 
starting point be found by dropping the pencil point once again, and the 
foregoing steps be repeated. 

The selection of the random numbers which locate the members of the 
sample may occasionally yield numbers larger than the population size. 
For example, a random number of 762 may be drawn when the popu- 
lation size is only 633. Obviously, such a case number does not exist and 
the choice is disregarded. 

After a number has been drawn once, any subsequent drawing of that 
number is ignored. Thus the possibility is avoided of including any case 
more than once in the sample. 

The simplicity of the procedure of drawing a random sample by means 
of a table of random numbers has made it a valuable tool. It is apparent, 
however, that the use of this procedure has been limited by restriction of 
this procedure to populations and subpopulations, each member of which 
can be identified. 


ESTIMATION OF THE MEAN—SINGLE 
HOMOGENEOUS POPULATION 


Estimates of population parameters may be means, totals, standard 
deviations, coefficients of correlation, proportions, or any other descrip- 
tive statistical measures. Of these population parameters, the mean is the 
one for which estimates are most often desired. For this reason, the 
discussion of estimation will be developed in two types of situations, i.e., 
estimation of the population mean in (1) a single homogeneous population 
and (2) a stratified population. 

The fluctuations appearing in means in successive samples has been 
discussed in the previous chapter. The degree of confidence which can 
be placed in an estimate is a function of the size of the sample and the 
amount of variability existing in the population. This confidence, in the 
classical theory of sampling was mathematically reported as 


Pe A 
ES VN 


The diffculty arising in the use of this formula in actual research 
situations is that the population standard deviation is unknown and must 


STATISTICAL INFERENCE—ESTIMATION 111 


be estimated from a single sample. A clear distinction should be dravn 
between the standard deviation in a sample or a group and an estimate 
of the standard deviation in a population made from a sample. As pre- 
viously shown in the chapter on variability the formula for the standard 
deviation for any distribution is 
za” 

“NY 
and for a population estimate, except in small finite populations, the 
formula is 


“SNN-1 
These formulas suggest that a sample standard deviation is not the best 
estimate of a population standard deviation. The bias is represented by 


the expression 
N 


N-1 
which, for most situations in educational and psychological research, is so 
nearly unity that for all practical purposes it might be disregarded. On 
the other hand, there is no satisfactory reason for ignoring this discrepancy, 
small though it may be. The necessity for using s rather than c, becomes 
paramount whenever more sensitive tests are demanded such as are needed 
in stratified populations. Thus the formula for the standard deviation esti- 
mate of a population parameter is always used as 
Ex? 


S2NN-1 


and the standard error of a mean value as 


AAA 

s2 = VN 
In the classical theory of sampling, the assumption is made that a 
normal curve table of unit area may be used to evaluate this standard 
error. This assumption is reasonably satisfactory when the number of 
Cases is large but becomes more and more untenable as the number of 
cases «in the sample decreases. Whenever the population mean and 
standard deviation are unknown and must be inferred from a sample, 
the sampling distribution of ¢ is more satisfactory for this purpose, The 
entries in the t-table, shown in the Appendix, become identical with those 
shown in a table of the normal curve of unit area as the number of cases 

indefinitely increases. 

It can be readily seen that the ¢ distribution is dependent upon sample 
Size. The entries in a t-table depend upon a concept known as degrees of 
freedom, which although not the number of cases, bears some relationship 


112 STATISTICAL METHODS 


to that number. For a sample from a single homogencous population the 
number of degrees of freedom (df) is one less than the sample size, the 
loss resulting from the necessity of computing the deviations away from 
a sample rather than from a population mean. Thus, in a sample of 25 
pupils, the sum of the scores was found to be 1,875 and the sum of the 
squares of the scores 156,463. The best estimate of the population mean 
as judged from the sample is 75, the sum divided by 25. The best estimate 
of the variance as judged from the sample is obtained by reducing the 
sum of squares of the scores, 156,463, to deviation form from the formula 


ze = 2x? SIT, 


which when divided by 24, one less than the number of cases in the 
sample, yields a variance of 659.92, or a standard deviation estimate of 
25.69. The confidence which can be placed in 75 as an estimate of the 
population mean may be inferred from computing a standard error of the 
mean from the formula 


s 
ET VN 
where sz is the standard error of the mean when obtained from a sample 
estimate of variability, and s is the estimated standard deviation in 
the population. In the situation described the standard error of the mean 
would be 
META 
V25 
This standard error of the mean of 5.14 is a value which suggests, in terms 
of probabilities, the degree of confidence which can be placed in the 
estimate of 75 as the population mean. 
To determine the amount of confidence which can be placed in the 
estimate, this standard error of the mean must be first multipled by an 
appropriate value of t. An inspection of a table of £ indicates that the t- 
value to be used in the multiplication varies with (1) the number of 
degrees of freedom and (2) the degree of confidence desired. The num- 
ber of degrees of freedom in estimating the mean value in a single 
homogeneous population from a sample is one less than the number of 
cases in the sample, which in the situation just described is 24. An inspec- 
tion of the tabled values of t reveals that the entries vary as the degree of 
confidence varies. The degree of confidence demanded is entirely arbitrary 
although general practice suggests using the 5 per cent or 1 per cent levels, 
the t-values being 2.064 and 2.797, respectively. When the standard error 
of the mean of 5.14 is multiplied by these ¿-values, the obtained products 
of 10.61 and 14.38 establish what are known as the 5 per cent and the 
1 per cent fiducial limits for the sample mean of 75. Thus the 5 per cent 


5.14 


STATISTICAL INFERENCE—ESTIMATION 113 


fiducial limits are 75 = 10.61, or 64.39 and 85.61, whereas the 1 per 
cent fiducial limits are 75 = 14.38, or 60.62 and 89.38. The procedure for 
determining the fiducial limits of a sample mean is shown by 


X +: tsz 


For the purpose of describing suitable interpretation of fiducial limits, 
the 5 per cent fiducial limits of 64.39 and 85.61 will be used. The popula- 
tion mean is of course unknown. It either lies within or without these 
fiducial limits. With such an interpretation the probability may be 
thought of as either zero or unity, i.e., it either is or is not between 
these limits. On the other hand, if many judgments are made of popula- 
tion means from sample estimates, the assertion that the population 
mean lies within the 5 per cent fiducial limits will be incorrect in 5 per 
cent of numerous judgments so made. The usual interpretation is that 
the chances are 95 in 100 that the population mean lies within the 5 
per cent fiducial limits, and 5 in 100 that it lies without those limits. The 
latter interpretation is quite satisfactory if fiducial confidence is implied. 

An analogy may make satisfactory interpretation more apparent. 
Suppose a dealer should deal face-down a single card to an individual. 
This card may or may not be the ace of spades. If this card is replaced 
in the deck and reshuffled and again a single card dealt, the card may or 
may not be the ace of spades. If this procedure be repeated indefinitely, 
a person indicating in each case that the card dealt is the ace of spades 
will be correct in 1 of every 52 times and incorrect in 51 of every 52 
times a card is dealt. Thus in general it may be said that the prob- 
ability of dealing the ace of spades for any single deal is shown by a 
1 in 52 chance. 

A further example to explain the theory of sampling has been sum- 
marized in Table 23. The mean scores for 100 samples of 25 fifth-grade 
pupils on a geography test, shown in the previous chapter, are shown to- 
gether with other entries needed for the ensuing discussion. Suppose no 
information is known concerning the population except that which might 
be inferred from any one of the 100 samples. Each sample mean shown 
represents the best estimate of the population mean which can be inferred 
from that single sample. The variance for each sample has been estimated 
by the formula 

yt oe 

eo N-1 
where Sz? is the sum of the squares of the deviations away from the 
Sample mean. 

The standard error of the mean for each of the 100 samples was 
obtained from the formula 


114 STATISTICAL METHODS 


TABLE 23. Sample Estimates for 100 Randomly Drawn Samples 
of 25 Cases Each 


EE 


FIDUCIAL LIMITS 


SAMPLE 5% 1% 

NO. MEAN a sz LOWER UPPER LOWER UPPER 
1 67.88 349.03 3.737 60.17 75.59 57.26 78.30 
2 92.44 540.54 4.650 82.84 102.04 79.48 105.40 
3 71.56 215.17 2.934 65.50 77.62 63.38 79.74 
4 90.20 511.69 4.524 80.86 99.86 77.59 102.81 
5 68.68 570.81 4.778 58.82 78.54 55.36 82.00 
6 89.04 528.48 4.598 79.55 98.53 76.23 101.85 
7 88.68 600.37 4.900 78.57 98.79 75.02 102.34 
8 87.84 534.47 4.624 78.30 97.38 74.95 100.73 
9 88.24 601.34 4.904 78.12 98.36 74.57 101.91 

10 70.92 518.45 4.554 61.52 80.32 58.23 83.61 
11 71.04 544.96 4.669 61.40 80.68 58.03 84.05 
12 87.04 495.81 4.453 77.85 96.23 74.63 99.45 
13 87.00 502.27 4.482 77.75 96.25 74.51 99.49 
14 70.68 631.89 5.027 60.30 81.06 56.67 84.69 
15 87.48 572.24 4.784 77.61 97.35 74.15 101.81 
16 71.00 697.58 5.214 60.24 81.76 56.47 85.53 
17 7382 375.73 3.877 65.32 81.32 62.51 84.13 
18 86.00 480.66 4.385 76.95 95.05 73.78 98.22 
19 72.84 457.22 4.276 64.01 81.67 60.92 84.76 
20 74.68 253.31 3.183 68.11 81.25 65.81 83.55 
21 86.08 554.25 4.709 76.36 95.80 72.96 99.20 
22 73.04 487.37 4.415 63.93 82.15 60.74 85.34 
23 85.16 436.94 4.181 76.53 93.97 73.51 96.81 
24 74.24 433.77 4.165 65.64 82.84 62.63 85.85 
25 84.84 607.71 4.930 74.66 95.02 71.10 98.58 
26 84.28 497.24 4.460 75.07 93.49 71.85 96.71 
27 74.68 453.98 4,261 65.87 83.47 62.80 86.56 
28 74.44 503.76 4.489 65.17 83.71 61.93 86.95 
29 73.96 616.62 4,966 63.71 84.21 60.12 87.80 
30 84.32 572.76 4.681 74.66 93.98 71.27 97.37 
31 74.28 576.88 4.804 64.36 84.20 60.89 87.67 
32 83.76 465.11 4.313 74.86 92.66 71.74 95 78 
33 75.16 403.39 4.017 66.87 83.45 63.96 86.36 
34 74.40 566.00 4.758 64.58 84.22 61.14 87.66 
35 74.88 467.95 4.326 65.95 83.81 62.82 86.94 
36 83.08 370.70 3.851 75.13 91.03 72.35 93.81 
37 7484 aam 48699 eu an am 8794 
38 8840 47443 4356 7441 9239 712 955 
39 83.04 429.23 4.144 74.49 91.59 72.49 94 59 
40 75.88 381.61 3.907 67.82 83.94 64.99 86.77 
Al om 54269 4659 7374 9298 7038 96.34 
do an 53444 464 om 978 7035 9613 
48 om 65992 am 6140 8560 0068 8032 
44 8332 578.48 4.810 73.39 93.25 69.91 96.73 
AE om mem aam 7457 mam "Lat 93.15 

46 83.44 632.48 5.030 73.06 93.82 69.42 97.46 

47 8272 434.57 4.169 74.12 91.32 71.10 94.34 

48 75.68 503.64 4.488 66.42 84.94 6317 8819 

49 76.36 33207 3645 68.84 83.88 6620 86.52 


50 83.64 748.99 5.474 72.34 94.94 68.38 98.90 


STATISTICAL INFERENCE—ESTIMATION 115 


TABLE 23. (continued) 


A 2 > 


FIDUCIAL LIMITS 


SAMPLE 5% 1% 

NO. MEAN se se LOWER UPPER LOWER UPPER 
51 82.80 555.08 4.712 73.07 92.53 69.62 95.98 
52 82.80 564.58 4.752 72.99 92.61 69.51 96.09 
53 7712 228.61 3.024 70.88 83.36 68.66 85.58 
54 8316 725.11 5.386 72.04 94.28 68.10 98.22 

55 82.16 406.56 4.033 74.84 91.48 70.88 93.44 
56 75.92 588.58 4.852 65.91 85.93 62.35 89.49 
57 75.76 661.27 5.143 65.14 86.38 61.38 90.14 
58 82.36 523.24 4.575 72.92 91.80 69.56 95.16 
59 75.72 722.13 5.375 64.63 86.81 60.69 90.75 
60 82.96 678.71 5.210 72.21 93.71 68.39 97.53 
6l 82.08 472.16 4.345 73.11 91.05 69.93 94.23 
62 76.88 492.11 4.437 67.72 86.04 64.47 89.29 
63 77.20 395.17 3.976 68.99 85.41 66.08 88.32 
64 81.88 598.94 4.895 71.78 91.98 68.19 95.57 
65 82.08 714.74 5.347 71.04 93.12 67.12 97.04 
66 81.08 302.08 3.476 73.91 88.25 71.36 90.80 
67 8140 480.42 4.384 72.35 90.45 69.14 93.66 
68 76.84 677.14 5.204 66.10 87.58 62.28 91.40 
69 77.18 505.54 4.497 67.90 86.46 64.60 89.76 
70 ` 80.76 289.02 3.400 73.74 87.78 71.25 90.27 
71 81.36 603.57 4.914 71.22 91.50 67.62 95.10 
72 77.68 528.48 4.598 68.19 87.17 64.82 90.54 
73 80.68 385.14 2.925 72.58 88.78 69.70 91.60 
74 78,24 256.77 3.205 71.62 84.86 69.28 87.20 
75 77.92 496.33 4,456 68.72 87.12 65.46 90.38 
76 80.72 598.54 4.893 70.62 90.82 67.03 94.41 
77 80.52 442.43 4.207 71.84 89.20 68.75 92.29 
78 80.44 396.76 3.984 72.22 88.66 69.30 91.58 
79 8048 736.43 5.427 69.28 91.68 65.30 95.66 
80 78.24 601.52 4.905 68.12 88.36 64.52 91.96 
81 78.52 382.93 3.914 70.44 86.60 67.57 89.47 
82 80.32 673.81 5.192 69.60 91.04 65.80 94.84 
83 78.44 487.59 4.416 69.33 87.55 66.09 90.79 
84 78.52 458.80 4.284 69.68 87.36 66.54 90.50 
85 78.66 413.52 4.067 70.27 87.05 67.28 90.04 
86 78.60 556.42 4.718 68.86 88.34 65.40 91.80 
87 79.92 448.24 4.234 71.18 88.66 68.08 91.76 
88 78.64 531.74 4.612 69.12 88.16 65.74 91.54 
89 78.72 436.04 4.176 70.10 87.34 67.04 90.40 
90 79.88 511.69 4.524 70.54 89.22 67.23 92.53 
91 7872 540.54 4.650 69.12 88.32 65.71 91.73 
92 78,84 478.72 4.376 69.81 87.87 66.60 91.08 
93 78.88 428.03 4.138 70.34 87.42 67.31 90.45 
94 79.72 537.79 4.638 70.15 89.29 66.75 92.69 
95 78.88 575.78 4.799 68.97 88.79 65.46 92.30 
96 78.96 500.37 4.474 69.73 88.19 66.45 91.47 
e? 79.52 421.34 4.105 71.05 87.55 68.04 91.00 
D 7912 326.86 3.616 71.66 86.58 69.01 89.23 

79.16 471.89 4.345 70.19 88.13 67.01 91.31 


100 79.26 548.66 4.685 69.59 88.93 66.16 92.36 


116 STATISTICAL METHODS 


where s is the estimate of population standard deviation obtained by 
extracting the square root of the estimated population variance shown in 
the preceding column of Table 23. The fiducial limits for each sample 
have been established by multiplying the standard error of the mean in 
each sample by 2.064 and 2.797, the values of t at the 5 per cent and the 
1 per cent level respectively with 24 degrees of freedom. 

An inspection of various fiducial limits, shown in Table 23, indicates 
that considerable fluctuation appears from one sample to another even 
when such samples are of the same size and have been randomly drawn 
from a single homogeneous population. In general, fiducial limits vary 
with (1) the size of the sample; (2) the fluctuations of estimates of the 
mean such as may appear in random sampling; and (3) the variability 
within the sample. To the degree that 100 samples are sufficient in size, 
a statement that the population mean lies within the 5 per cent fiducial 
limits would be incorrect 5 times and within the 1 per cent fiducial limits 
once. 

In actual research situations information for successive samples is not 
available. Inferences must be drawn from single samples. In another 
respect the information shown is artificial. The mean of the population is 
known. This population mean of 79.30 is unobtainable from the infor- 
mation in the table although inspection will reveal that this value is not 
unexpected. The population mean lies without the 5 per cent fiducial 
limits in six of the 100 samples and without the 1 per cent fiducial 
limits in two of these samples. Empirical evidence even from 100 judg- 
ments that the population mean lies within the fiducial limits conforms 
closely to theory from the evidence here shown. 


ESTIMATION OF THE MEAN—STRATIFIED 
POPULATION 


Whenever a population consists of subpopulations, greater confidence 
can be placed in an estimate of a mean if random samples are obtained 
within each subgroup. The numbers of cases in the sample subgroups 
should be proportional to the sizes of the population subgroups, whenever 
population estimates are desired from sample analysis. 

The standard error of a mean, with few exceptions, will be smaller in 
a stratified population in which stratification has been made than one 
in which such stratification is ignored. If the stratification is ignored, the 
fiducial limits are obtained by the procedure described for a single homo- 
geneous population. 

As a first step in this procedure, the Lx? is required, i.e., the sum of the 
squares of the deviations from the mean. In a sample from a single homo- 
geneous population, the deviations are computed from the sample mean. 
In a sample from a stratified population the deviations are computed 


STATISTICAL INFERENCE—ESTIMATION 117 


from the mean of the subgroup in which such variates are classified. 
A Ya? which has been obtained by combining the squared deviations 
around subgroup means is referred to as a “within” sum of squares. 

An example of the procedure for obtaining the within Dz? may be had 
from a sample of college students drawn from the student body listed in a 
student directory. If an estimate is desired of the mean amount of money 
earned each month, it might be useful to stratify on the basis of sex and 
class level as shown in Table 24. Such stratification divides the popula- 


TABLE 24. Data Sheet for Stratified Sample 
of College Students 


SEX 
CLASS LEVEL MALE FEMALE 
Freshman EX; IX: 
2X} 2%3 
kı a 
Sophomore 2X; IN 
E 2X5 zxi 
3 le 
Junior IX; KA 
zX? zxë 
ks b 
i EX EA 
enior 7 
è DX? zxë 
ky SE s 
Graduate EX, 2X0 
SES 2X% 
ko kio 


tion into 10 subgroups. The steps required in establishing fiducial limits 
for the mean monthly income are: (1) choosing a suitable sample; (2) 
computing the within sum of squares; (3) computing the standard error; 
and (4) establishing fiducial limits. 

A suitable sample is here defined as one in which the number of cases, 
kı to kao, in the sample subgroups will be proportional to the number of 
Cases in the subgroups of the entire student population, and one in which 
each student in any of the ten subgroups has an equal opportunity of 
being included in the study. One student directory may be used for 
identifying the freshman men who could be indicated by numbers..from 

2,3, and so on, until the names in the entire directory are considered. 
ach of nine other directories could be used in a similar way for the 
other nine subgroups of students shown in Table 24. The number of indi- 
Viduals in each of the ten subgroups of the population would then be 
Available, The relative number of cases to be included in each of ten 
Subgroups is thus apparent. The problem still remains of deciding upon 


118 STATISTICAL METHODS 


the proportion of the entire student body to be included in the total 
sample, such as 1 in 100, 1 in 10, 1 in 5. This decision usually depends 
upon the time and money available or justified for the investigation. 

The choice of the particular students who should be included in any 
sample subgroup may be readily determined from a table of random 
numbers. The table is consulted as often as necessary in order to obtain 
the desired number of cases in the subgroup. This use of the table of 
random numbers would be followed for each of the ten subgroups shown 
in Table 24. Thus a sample is obtained which is like the entire popula- 
tion except for such fluctuations which may be expected from random 
sampling. 

The degree of confidence which can be placed in an estimate of a 


population mean depends upon the amount of individual differences, since 


the greater the variability present the greater will be the range of the 
fiducial limits. This variability is obtained by considering the amount 
each value differs from a mean. The sum of the squares of these devia- 
tions, Xa”, is required for establishing fiducial limits for a mean in any 
sample. In a stratified sample, such as the one shown in Table 24, the 
deviations are found from the mean of the subgroup in which the values 
occur. Thus, the within Xx? for the entire sample is obtained as follows: 


za = 3x1 — EH 

1 

° za = xz — CA” 
S e : ka 


ze = zt — CZ 
kio 
The within sum of squares in any given sample may be found by sub- 
stituting the numerical quantities in each of the subsamples and then 
summing for the Yx”'s for all subsamples. An alternate method, mathe- 
matically identical and more often found convenient, utilizes the sum 
throughout all subgroup samples 


Ya? = X — JE. eE + se ET 
1 S 010 
Thus, the within sum of squares for the entire sample is found by sub- 
tracting a correction term from the sum of squared raw scores of all 
values in the sample without regard to stratification. The correction term 
is found by summing for all subgroups the sum in each subgroup of the 
quotient of the squares of the sum of the values to the number of cases 
involved. Thus, the within sum of squares will be smaller than the sum of 
squares from the total sample mean, except in the almost inconceivable 
situation in which the means of all subgroups are identical. 
The standard error of the mean may be found from the formula 


“Së 
EE TA 


STATISTICAL INFERENCE—ESTIMATION 119 


sS 


"Tv 
The only difference in this formula for use with a stratified population 
and that for a single homogeneous population is the s term. This estimate 
of the population standard deviation for the purpose of statistical in- 
ference is obtained from the formula 


JË. 
s a E 


whenever the Sz? has been found away from a general mean in a sample. 
In a stratified sample, the term N — 1, is replaced by N — 2, N — 3, and 
so on, the last figure representing the number of means from which the 
deviations have been calculated. Thus in the example shown in Table 24, 
in which ten subgroups were involved, the formula would be 


= Ex? 
` S=\N-10 


where the Sz? is the within sum of squares. 

After obtaining the standard error of the mean, the distance away 
from the sample mean which establishes the fiducial limits may be found 
by consulting a table of t in the manner suggested for such evaluation 
in a single homogeneous population. The only exception to the foregoing 
is in the number of degrees of freedom which is to be considered in the 
table entries. For a sample from a single homogeneous population, the 
number of degrees of freedom is N — 1, the reduction from N resulting 
from the calculation of the sum of squares away from a single mean. 

or a sample from a stratified population, the number of degrees of 
freedom is N — n, where n equals the number of means from which 
deviations have been computed, i.e., the number of subgroups. Thus, in 
the example shown in Table 24, consisting of ten subgroups, the number 
Of degrees of freedom is N — 10. : ne 
. “Ne accuracy with which fiducial limits may be established in either a 
Single homogeneous population or in a stratified population is contingent 
upon normality in the population or in the strata within the population. 

his condition of normality in many situations in education and psy- 
chology cannot be demonstrated. The failure of most research situations to 
Meet this requirement, fortunately, is not so serious as theory might sug- 
8est. Only in extreme cases of nonnormality are the usual methods of 
establishing fiducial limits open to question. It is possible, of course, to 
Postulate a population so nonnormal that fiducial limits of a sample 
Mean are almost meaningless. For example, the fiducial limits of the 
monthly earnings of college students in a university in which 80 per cent 
Of the students report no such income may very well yield from a sample, 


120 STATISTICAL METHODS 


a mean with a lower fiducial limit of less than zero, i.e., a negative 
monthly earning. It should be apparent that the suitability of the assump- 
tion of not too great departure from normality, is shared jointly by the 
statistician and the educator or psychologist who has intimate knowledge 
of the theory in the area in which the study is proposed. 


ESTIMATION OF THE MEAN OF FINITE 
POPULATIONS 


Most populations for which inferences are desired from a sample are 
finite. The standard error of a mean decreases as the size of the sample 
approaches the size of the population. Whenever the sample is the popu- 
lation, the standard error of the mean is zero. In such a situation no 
inference is necessary. The findings in the group studied are what they 
are, without any possibility of generalization whenever the sample and 
the population are identical. 

In most situations in which estimation is appropriate, the population 
is finite, i.e., a city, a county, a state. The formulas shown in the fore- 
going discussion are based upon the assumption of an infinite population. 
Obviously, these formulas should be modified whenever the sample con- 
stitutes an important part of the population for which an estimate may 
be desired. The standard error of a sample from any finite population 
mean may be written 

“a= -=n N 

" VN N por. 
when N is the number of cases in the sample and Npop. is the number 
of cases in the population. From this formula it is obvious that the 
standard error of the mean approximates zero as the proportion of the 
population included in the sample increases. Thus the standard error 
of the mean varies for various proportions of the sample to population 
as indicated by the amount shown in following proportions of the 
population included in the sample: 


SILE OF A ZN 

SAMPLE Noop. 
10% 0.95 
20% 0.90 
30% 0.84 
40% 0.77 
50% 0.71 
60% 0.63 
70% 0.55 
80% 0.45 


907, 0.32 


STATISTICAL INFERENCE—ESTIMATION 121 


From the foregoing information, the necessity of making allovvances in 
fiducial limits in a sample becomes apparent as the proportion of the 
sample to the population increases. For all practical purposes, the theory 
of a sample from an infinite population is applicable to a finite popula- 
tion unless the sample is 10 per cent or more of the population for vyhich 
an estimate is desired. 


ESTIMATION IN STATISTICAL INFERENCE 


'The place of estimation in statistical analysis varies depending upon 
the research area involved. In public-opinion analysis, the usefulness of 
the theory of estimation is paramount. There are other arcas of research 
in education and psychology, however, in which it is extremely difficult 
to find situations in which the theory of estimation may be legitimately 
applied. 

Population information is often available as a result of administrative 
Toutine concerning teacher salary, education, or load. Similarly, pupil 
information, such as age, attendance, aptitude and achievement scores, 
is often available for the entire population from state records. Estima- 
tion of some population characteristic such as a mean from a sample 
1S unnecessary in such instances. 1 

Whenever it is necessary to estimate a mean or any other descriptive 
statistical measure from a stratified population, the sample size in each 
Subgroup should be proportional to that occurring in the population. This 
Procedure is unique to estimation and should not be followed in the test- 
ing of hypotheses, the other aspect of statistical inference which over- 
Shadows estimation in importance in actual research in education and 
Psychology, Although it is possible to acquire some information concern- 
ing a hypothesis from a sample which is designed for estimation, such 
APproach requires statistical treatment beyond the scope of this book and 
at best, yields a vague approximation of hypothesis testing. 


Exercises 


1. Define a representative sample. Does a random sample guarantee repre- 
Sentativeness? Why? 
sh ite at least five instances, € 
Se een drawn from populations. Deseri 
er of drawing the samples. A 
strati List the similarities and dissimilarities between a random sample and a 
Tatified random sample. : 
w£ Differentiate between a “small sample and a “large” sample in terms of 
© theory of sampling. 3 E ; i 
Sg - As the size of a random sample increases, will there be an increase in the 
ability of the sample estimates of the population parameters? Why? 
SSC escribe at least five different groups of cases which, depending upon the 
£ment of the investigator, might be considered as populations, or as samples 
YPothetical populations. 


actual or fictional, in which biased samples 
be in detail the populations and the 


122 STATISTICAL METHODS 


7. Consider the following statements: 

“The probability that a male army recruit at Camp X is red-green color 

blind is 4 per cent.” 

“The probability is 4 per cent that a randomly selected male recruit of 

Camp X will be found to be red-green color blind.” 

a. Are these statements equivalent? Why? 

b. Which of the two statements most closely resembles in form the inter- 
pretation of fiducial limits? 

8. In determining the fiducial limits of the mean, the number of degrees of 
freedom is one less than the number of cases. Why? 

9. A questionnaire was mailed to a population of 800 public school superin- 
tendents in which information was requested concerning their job satisfaction. 
A total of 504 questionnaires were returned. 

a. Would you be willing to consider the results obtained from the 504 
superintendents as probably typical of the population of 800 superin- 
tendents? Why? 

b. Suggest a technique by means of which evidence could be found as to 
the similarity or dissimilarity of respondents and nonrespondents. 

c. If you were to defend the 504 respondents as a sample, would you 
rather consider them as a sample from a real population or a hypo- 
thetical population? Explain. 

10. A popular newspaper in a large city conducted a “straw poll” among the 
citizens of the city in order to determine their reaction to the proposal of locat- 
ing a race track immediately outside of the city limits. All interested readers 
were encouraged to clip the ballots from the newspaper and mail them unsigned 
to the office of the editor. Ballots were printed six times over a period of three 
weeks. Editorially, the newspaper opposed the proposed construction. At the 
conclusion of the poll, the newspaper announced that the citizens of the city 
were overwhelmingly (73 per cent) opposed to the proposal. The number of 
questionnaires returned was 504. 

a. Criticize the procedure followed by the newspaper. 

b. Suggest a more satisfactory means of attacking the problem. 

11. In the 1948 and 1952 presidential elections, the attempts made by na- 
tional public-opinion polling organizations to predict the successful candidate 
have been generally criticized by the press as too “inaccurate.” Do you agree or 
disagree with this opinion? Why? 

12. Evaluate the most recent methods applied by public-opinion polling 
organizations as to: 

a. Sampling technique 

b. Interview methods 

c. Analysis of data 

13. An investigator administered a mechanical comprehension test to a sample 

of 170 gas station attendants employed by a certain company. The data are 
summarized below. 


N =170 ZX = 5,893 
X = 34.66 EX? = 215,155 


Compute and interpret the 95 per cent fiducial limits of the mean. 


8 


Statistical Inference— Testing 


Hypotheses 


f Testing hypotheses, rather than estimation, is the area of statistical 
inference which is of major concern to the research worker in education 
and psychology. For purposes of statistical inference, the null hypothesis, 
i.e., no population difference, is usually postulated. Evidence is then as- 
sembled to ascertain whether such a postulation is untenable. 

The foregoing statements, no doubt, need further explanation with 
regard to (1) the definition of the hypothesis; (2) methods of stating a 
hypothesis; (3) choice of a criterion to be used; (4) types of populations; 
(5) statistical design; (6) mathematical models; (7) confidence level 
required for rejecting a hypothesis; and (8) standards required for proof. 


DEFINING A HYPOTHESIS 


For purposes of statistical inference, a hypothesis may be defined as a 
tentative assumption, stated as a generalization, which is to be tested 
from a sample. A problem to involve a hypothesis requires the process 
of generalization after the data have been assembled and necessary 
descriptive statistical measures for testing the hypothesis have been 
found. In general, problems needing solution may be classified into three 
groups: (1) those in which estimation is appropriate, (2) those in which 
the testing of hypotheses is required, and (3) those in which statistical 
inference is neither needed nor appropriate. 

Perhaps an example will suffice to bring out the distinction necessary 
for identifying the area to which any given research problem belongs. 

f the mean enrollment in classes in first-semester algebra in a state is 
desired, it may be obtained by a complete canvass of all high schools. 
If complete information regarding class size is available from reports to 
the state, no test of significance has meaning since statistical inference 
implies that data have been collected from a sample. If complete class- 
size information is not available and if a complete canvass cannot be 

123 


124 STATISTICAL METHODS 


justified because of time and expense involved, the mean class size for the 
state may be estimated from a suitable sample. The problem in this case 
involves that area of statistical inference known as estimation. If the 
problem is to ascertain the relationship between class size and algebra 
achievement, it involves the testing of a hypothesis. The last type of 
problem, i.e., the testing of a hypothesis, it may be repeated, predomi- 
nates in educational and psychological research. 


STATING HYPOTHESES 


For purposes of informing others of the issue in any research study, 
a hypothesis may be expressed in several ways, but for purposes of 
testing significance the hypothesis must be stated in terms of numerical 
quantities. If a problem is to be stated concerning the mean heights of 
adult men and women, any of the following expressions will convey the 
meaning with respect to mean height: 

(1) Are men or women taller? 

(2) Men are taller than women. 

(3) Women are taller than men. 

(4) Men and women are the same height. 

(5) Men (or women) are taller, either plus or minus by some actual numerical 

difference, such as 4 inches. 

The first expression suffices to inform others of the issue, but cannot be 
considered as a hypothesis in terms of the foregoing definition because it 
is not stated in the form of a positive assertion. From the standpoint of 
statistical inference, all the first three expressions are too indefinite to be 
satisfactory. The fourth statement of no difference in height is known 
as the null hypothesis and the fifth statement can be treated as a null 
hypothesis whenever allowances are made for the postulated mean dif- 
ference. The null hypothesis, then, becomes the statement of a research 
issue which may be evaluated by an appropriate test of significance. 

In addition to informing others of the issue in any research study, the 
hypothesis serves to direct the efforts of the investigator in the collection 
of appropriate evidence. If this directive function of the hypothesis is 
to be effective, however, the hypothesis must be clearly recognized before 
any evidence is assembled. Without the directive function of a hypothesis, 
the evidence collected is likely to be incomplete or inappropriate. With- 
out a hypothesis to guide the collection and analysis of evidence, a re- 
search study may be reduced to sheer activity. The foregoing statement 
should not be interpreted to mean that new hypotheses cannot be formu- 
lated during the course of an investigation, or that an original hypothesis 
should never be abandoned or changed; rather the interpretation should 
be that the research effort becomes more efficient as hypotheses to be 
tested are recognized in the planning stages of each step in the research 
project. 


STATISTICAL INFERENCE—TESTING HYPOTHESES 125 


CHOICE OF A CRITERION FOR TESTING 
HYPOTHESES 


A major problem in any research study arises in choosing a satisfactory 
criterion for the evaluation of 2 hypothesis. In testing the null hypothesis 
concerning mean heights of male and female adults, the criterion of 
heights may be expressed in inches or centimeters. Either is quite satis- 
factory. The choice of a satisfactory criterion in educational and psycho- 
logical studies, many times, is not so clear-cut. For example, whenever a 
hypothesis is to be evaluated in terms of student achievement, the choice 
of a suitable criterion requires careful consideration. Achievement is 
usually defined as the degree to which students have attained the impor- 
tant objectives of any educational experience, With such a definition, a 
research study would have as many hypotheses as proposed objectives. 
The investigator then must make the decision to (1) test as many hy- 
potheses as there are important objectives: (2) limit his study to one, or 
perhaps more, objectives with the resulting limitation in conclusions; or 
(3) combine the scores or ratings on the important objectives to form 
a composite criterion. The first alternative, although quite desirable, 
usually is not feasible from the standpoint of time and expense as well as 
of available instruments for evaluating many important objectives. The 
third alternative, combining objectives for a single criterion, has the dis- 
advantage that the weighting of the different objectives is highly arbi- 
trary. Furthermore, agreement on the relative importance cannot be 
expected, since such importance stems from an individual’s philosophy of 
education, The second alternative, limiting the scope usually to a single 
objective, is generally chosen. Such limitation permits a more scientific 
attack on a small segment of achievement, and suggests that other areas 
be explored in subsequent research. 


TYPES OF POPULATIONS IN THE TESTING 
OF HYPOTHESES 


_The concept of populations pertinent to estimation must be recon- 
Sidered when testing hypotheses. With respect to homogeneity and strati- 
fication, the population concepts are identical for these two areas of 
statistical inference. The differences in population concepts become pro- 
hounced in regard to finite-infinite and real-hypothetical populations. 
Whereas in estimation populations are usually finite and actually exist, 
In the testing of hypotheses populations are infinite and usually hypo- 
thetical. 

The concept of an infinite population may be noted in the heights of 
men and women. The inference desired is that, regardless of the number 
oi men and women, the null hypothesis of zero difference in mean height 
is either tenable or untenable. The most efficient type of sample to rep- 


126 STATISTICAL METHODS 


resent an infinite population of this kind is one which includes an equal 
number of men and women. The concept of a hypothetical population, 
although not unique to the social sciences, permeates most research in 
education and psychology. If a novel method of teaching a unit in some 
subject is proposed, a research study may well be designed to evaluate 
this method as compared to the method which has been traditionally 
employed. Obviously, there is no actually existing population taught by 
this new method. It may well be that the experimental group of students 
is the only group which has been exposed to this teaching method. The 
hypothetical population consists of an infinite number of similar students, 
taught under similar conditions to those prevailing in the sample group 
evaluated. 


DESIGN OF RESEARCH STUDIES 


The planning of a study requires most careful consideration. Any study 
should be so designed that available tests of significance will yield satis- 
factory interpretations. Satisfactory design may be accomplished by (1) 
delimitation through selection, (2) suitable control either by stratification 
or by regression analysis. The method of control which should be em- 
ployed varies from one research study to another. Perhaps the most 
understandable occurs when the study is delimited by selection of the 
sample. This type of control, obviously, limits the interpretation but does 
permit a more scientific attack on a research problem. In some situations, 
some data will be eliminated because of the small number of cases. Thus 
in a study involving engineering students, the young women majoring in 
engineering may constitute such a small number of students, that most 
hypotheses can be tested better when evaluated in terms of the young 
men with the young women eliminated from consideration. The delimita- 
tion by selection, however, may be suitably employed in situations other 
than those in which a small number of cases exist. 

Whenever stratification of a sample is employed for the purpose of con- 
trol in the testing of a hypothesis, it often is possible to avoid some 
delimitation by selection. If appropriate design and statistical models, 
later to be described, are not chosen, stratification is all but meaningless. 
On the other hand, failure to stratify, with the ensuing assumption 
that the sample may receive analysis as though dealing with a single 
homogeneous population, is indefensible in most studies in education and 
psychology. 

Bases for stratification used in many studies are sex, grade level, socio- 
economic level, occupation, and others. Usually a study becomes ex- 
tremely complicated whenever stratification is attempted upon more than 

three bases of classification. If more than three bases are considered 
important, the study is delimited by selection of the sample, which auto- 
matically reduces the complexity of the population. Inferences drawn in 


STATISTICAL INFERENCE—TESTING HYPOTHESES 127 


the study are, in consequence, less far-reaching but are logically more 
defensible. In a stratified population, the sample is so chosen that an 
equal number of cases is taken in each of the subgroups to avoid dis- 
proportionality in analysis, and to conform to the usually prevailing con- 
cept of an infinite and hypothetical population. 

Control by means of regression is a suitable procedure in many studies. 
The procedure, described in detail in many of the ensuing chapters, con- 
sists of making allowances in the evaluation of a criterion, for one or 
more variable characteristics related to the criterion. The use of the 
LQ., perhaps, for controlling upon student aptitude in a study of psy- 
chology achievement, is a good example of such regression control. In 
general, it may be said that control by regression is undertaken to de- 
limit a problem to the point where either regression or stratification is 
feasible as well as to eliminate from a study small and atypical groups, 
the analysis of which would confuse the vital issue under investigation. 
The relative importance of selection, stratification, and regression for the 
purpose of control varies widely from one applied field to another. In 
agricultural research, quite appropriately, selection and stratification 
predominate almost to the exclusion of regression, whereas in educa- 
tional and psychological research, regression control requires great em- 
Phasis with less emphasis on stratification control. 


MATHEMATICAL MODELS 


A study should be designed by sample selection and stratification in 
such a manner that an appropriate test of significance is available. De- 
Signs and statistical models available for such tests of significance, to a 
large extent, are the concern of all the discussion from here to the end of 
this book. A study is evaluated for any design in terms of some descriptive 
statistical measure such as the mean, median, proportion, variance, or 
Coefficient of correlation. The design and statistical model vary depending 
Upon the type of the population for which inferences are desired. The 
Single homogeneous population provides the most easily understood ex- 
amples, although perhaps the most infrequently encountered situation in 
actual research. For purposes of this chapter, three designs will be dis- 
cussed, and other designs will be considered in succeeding chapters. The 
three to be here discussed are (1) the difference between the means of 
two samples when each sample has been chosen at random, (2) the dif- 
ference between the means as well as the standard deviations of two 
Samples when each sample has been chosen at random, and (3) the 
difference between two means in correlated samples. 


CONFIDENCE LIMITS 


_ Whenever the null hypothesis is postulated, by definition, the difference 
in the population means is zero. As the two sample means differ, the 


128 STATISTICAL METHODS 


probability of the null hypothesis being tenable becomes smaller and 
smaller. The probability required for rejecting the null hypothesis is 
highly arbitrary but common practice has been to use the 5 per cent or 
the 1 per cent level. The former level, usually referred to as a significant 
difference, implies that the sample mean difference is so great that it 
would occur in less than 5 per cent of the samples from populations in 
which the mean differences are zero. 

It bears repeating that a significant difference as here used should not 
be construed as an important difference. A difference may be significant, 
because of the large number of cases in the sample, without any known 
social consequence. On the other hand, a difference may be nonsignificant, 
1.e., the null hypothesis cannot be rejected, and yet the sample difference 
might be highly important if it actually exists in the population. The test 
of the null hypothesis leads to rejecting or not rejecting the null hy- 
pothesis. It should be especially noted that failure to reject the null 
hypothesis should not be interpreted as equivalent to accepting the null 
hypothesis. The null hypothesis is only one of the hypotheses of an actual 
mathematical difference which might be postulated. The null hypothesis, 
in fact, is not the most probable hypothesis which might be postulated 
except in the unusual situation that the sample mean difference is zero 
or unless some knowledge exists concerning the population other than 
that which may be inferred from the sample. For example, a sample of 
heights of five men and five women would probably yield a nonsignificant 
sex difference in mean height. Then from the information at hand, the 
null hypothesis cannot be rejected but most certainly it should not be 
accepted. 


STANDARDS REQUIRED FOR PROOF 


In the testing of hypotheses, statistical inference may be considered to 
have been completed whenever the null hypothesis cannot be rejected or 
it has been rejected at the 5 per cent or the 1 per cent level. The solution 
of a research problem, however, demands a further step in drawing a 
conclusion, Some positive statement is needed concerning the research 
problem. Thus, if achievement is evaluated for two groups of students 
who have been exposed to two different types of educational experiences, 
the difference between experience A and experience B will result in either 
failure to reject, or in the rejection of, the null hypothesis, i.e., no dif- 
ference in mean achievement. Interpretations of these alternatives suggest 
that with nonsignificant differences, available evidence fails to provide 
proof that either A or B is the superior experience. In many situations, a 
further modification may be made with regard to sample size, Thus, if 
the available sample consists of twenty students in each of the A and 
B groups, available evidence fails to provide proof that either A or B is 


STATISTICAL INFERENCE—TESTING HYPOTHESES 129 


the superior experience. The investigator thus indicates the paucity of 
the data with the implication that a larger sample might or might not 
cause a change in his conclusion. On the other hand, if the sample size 
were large, such as 1,000 in each group, even a cautious investigator 
would not hesitate to conclude from a nonsignificant difference that evi- 
dence tends to show that the difference between the A and B experience 
is small, if indeed any such difference actually exists. In any case such a 
difference with the large samples can be, at best, of little or no social 
importance although it might be of academic interest. 

The interpretation which may be made whenever the null hypothesis 
has been rejected at either the 5 per cent or the 1 per cent level, presents 
some difficulty in semantics. One way the interpretation might be ex- 
pressed is that proof has been found from available evidence that A is 
superior to B as an educational experience. Some objection may be raised 
to use of the word proof in the foregoing statement. If proof is so used, 
it must be interpretated in the light of fiducial probability rather than of 
absolute demonstration. Since many readers may not be pragmatists, 
it is more conservative to state that available evidence indicates the 
Superiority of A over B as an educational experience. The use of the word 
Proof is no doubt an overstatement, and the word indicates is an under- 
statement of the degree of confidence which should be placed in interpre- 
tations that have been drawn as a result of rejecting the null hypothesis. 

Difference Between Two Means—Separate Group Variance—Sample 
Groups Equal Size. Many research issues involve testing the null 
hypothesis. As a starting point in the discussion of such tests the differ- 
ence between the means of two samples will be considered. It is assumed 
in this design that it is unnecessary to subdivide the two groups into 
subgroups by stratification. Thus it is assumed in this design that each 
of the groups to be compared represents a sample from a single homo- 
&eneous population with respect to the criterion which is to be evaluated. 
It is, furthermore, assumed in this design that no correlation exists in 
drawing the groups which are considered as samples. For example, a com- 
Parison of the intelligence test scores made at the age of six years for the 
first and second child in a family would be correlated if the sample should 
be assembled in such a way that only cases were included in which the 
first- and second-born in any given family were considered. On the other 
hand, if the sample should be assembled in such a manner that no family 
included in the first-born group could be included in the second-born 
group, no correlation would exist. It may be apparent, at this point, that 
a more satisfactory test of significance in the correlated sample could be 
had from a statistical model which would take into account the correla- 
tion between test scores of siblings. The testing of significance in a cor- 
Telated sample, however, will be considered later, At the present the 


130 STATISTICAL METHODS 


discussion is limited to the design of two noncorrelated samples. The 
formula, sometimes referred to as the statistical model, for testing this 
design is 


= XG = Xo 
Vata, 


where the obtained ¢ can be compared with the tabled values of ¢ for vari- 
ous degrees of freedom shown in the Appendix. The X’s are the means. 
The sł, and s2 are the squares of the standard errors of the means in the 
first and second sample groups. Each may be obtained from the formula 


where k is the number of cases in the sample group and s? is the popula- 
tion variance estimate from that sample, which, in turn, may be found 
from the formula 

Za” 
k-1 


The Za” is the sum of the squared deviations of the scores in a sample 
group away from the mean in that group. The number of degrees of 
freedom to be consulted in tabled values of t is k — 1 whenever kı = ke. 

For convenience, the three foregoing formulas for obtaining the t-value 
may be combined into a single formula, expressed as follows: 


Xi — Ex 
Zei dë Zz? 
in: — 1) * lana — 1) 


To illustrate the use of this formula, a test of significance will be made 
for two groups of 20 students each. One group was, and the other group 
was not, supplied with supplementary mimeographed exercises designed 
to be helpful with respect to the application of principles to new situa- 
tions. 

The scores made on the examination to measure application of prin- 
ciples for the two groups are shown in Table 25. From these scores it was 
found that the mean of 27.9 for the group which had the mimeographed 
materials exceeded the mean, 22.7, of those for whom such materials were 
not provided by 5.2 points. 

The issue now arises whether such a discrepancy of 5.2 points actually 
represents a useful supplement to teaching or whether this mean differ- 
ence might not have resulted from chance in choosing at random groups 
as small as twenty each. To test the null hypothesis that the use and 
nonuse of mimeographed materials are equally effective, the assumption 
is made that the characteristics of individuals in the two samples which 


2 


D 


t= 


STATISTICAL INFERENCE—TESTING HYPOTHESES 131 
TABLE 25. Scores on Application ` 

of Principles Examination 

HAD MIMEOGRAPHED MATERIALS 

YES NO 

37 36 

18 16 

46 19 

18 25 

53 34 

26 30 

28 29 

33 5 

30 29 

| 20 18 
| 35 30 
25 15 

19 19 

18 12 

14 16 

40 30 

l 19 19 
| 37 35 
21 21 

21 16 


might otherwise be controlled by cither stratification or regression anal- 


ysis may well be ignored. 
From the information shown in Table 25, the sums and sums of squares 
stimates of variance are as 


together with the resulting means and e 


follows: 
GE 
WITH WITHOUT 
MATERIALS MATERIALS 
k 20 20 
=X 558 454 
X 27.9 22.7 
| sxe 17,734 11,694 
Sc 2,165.8 1,388.2 
si 113.99 73.06 
2 5.70 3.65 


The Ze for each group is obtained as usual from the formula 


2X)? 
zg” = EX” — E 


By substitution, the formula ZS SE 


EE 


ri 


2 


TA A a 
të ka-D hh — D) 


132 STATISTICAL METHODS 


becomes 
27D — 22.7 = 1.700 
2,165.8 En 1,388.2 


(20)(19) * (20)(19) 


When comparing this t-value of 1.70 with the tabled values of ¢ shown in 
the Appendix it is necessary to consider the number of degrees of freedom 
which in this case is 19, i.e., (ky — 1) or (ke — 1) since the squares of the 
standard errors have been found in two independent estimates of 20 cases 
each. With 19 degrees of freedom this t-value is smaller than that de- 
manded for significance at the 5 per cent level. Thus the null hypothesis, 
that neither advantage nor disadvantage in achievement results from the 
mimeograph materials, cannot be rejected. The interpretation may then 
be made that, with two samples of 20 students each, the usefulness of the 
mimeographed materials has not been demonstrated for an infinite hypo- 
thetical student population from which the 40 students, about whom the 
analysis has been made, might be assumed to represent a random sample. 

Difference Between Two Groups—Separate Group Variance—Sample 
Groups Unequal Size. In the foregoing discussion the sample groups are 
equal. In many research situations the samples are not of equal size. The 
method of analysis for groups of equal size is also applicable to sample 
groups of unequal size unless one group is extremely small and the other 
extremely large. An example of the analysis with sample groups of un- 
equal size may be shown from available information for 258 junior high 
school girls. These girls were classified on the basis of having many or 
few girl friends. The needed information with respect to I.Q. for these 
girls is shown in Table 26. 


TABLE 26. Intelligence Quotient Data for Girls Having Many 


and Few Girl Friends 
=e SSSSS > — 
GIRL > 
FRIENDS NO. ZE Ka zX: 
Many 137 14,459 105.54 1,539,745 
Few 121 12,326 101.87 1,291,896 
Total 258 26,785 103.82 2,831,641 


The difference in the mean LO. is 3.67. The sums of squares for within 
group deviations needed for the t-formula are 


2 
1,539,745 — ne = 13,740.03 


2 
1,291,896 — IS = 36,273.88 


STATISTICAL INFERENCE—TESTING HYPOTHESES 133 


The formula for ¢ is then 
3.67 = 2.04 


È= eeh 
[13,740.03 , 36,273.88 
-N(137) (136) ` (121)(120) 


SH "gen number of degrees of freedom requires special considera- 
iat henever the sample groups are of unequal size. It will be recalled 
R Zeie same formula, when used for sample groups of equal size, yields 
e ue which is associated with k — 1 degrees of freedom, where k is the 
ER e of cases in either group. With sample groups of unequal size, the 
cee value of £ may be from either 136 degrees of freedom or from 120 
E ae of freedom, or both. If a difference is noted in the ¢-value entries, 
Eier t-value lies somewhere between these two tabled values. 
Point Y it is quite satisfactory to accept as the desired t-value, the mid- 
of the entries shown for kı — 1 and ke — 1 degrees of freedom. In 


THOS etic nts e 
Trek situations occurring in educational and psychological research, the 
that tabled values of ¢ differ but little 


SL 

RE SSC? group is sufficiently large 
other | 1 and ka — 1 degrees of freedom unless ky or ka is small and the 
e SE, Even though the resulting difference 1, t-values is pronounced, 
aro of the tabled t-values yields a satisfactory estimate of the 
are dj t except in situations in which the variances of the sample groups 
equ ‘aed different. On the other hand, the preceding formula for t is 
ally useful regardless of similar or dissimilar variances in the sample 


er DH D 
e ye In the unusual situation in which there exists an extreme differ- 
Ce variance between the two sample groups, together with a small 

er of cases in one group and a large number in the other group, 


So. 
E ie saa becomes necessary”. Wës? 
Be e situation in which the analysis has been made, the number o 
le S of freedom is 136 for the group with many friends and 120 for 
group with fow friends. Most tables of t do not include entries for 


more th i il, fol 
l an 30 d _ With tables shown in more detail, fol- 
owed by egrech VS cent level have been 


. by interpolati values at the 5 per 
es polation, the t-va 

t mated to be 1.978 'and 1.980, for 136 and 120 degrees of freedom 
Þectively, Since the t-value is greater than either of these 5 per cent 


Blues, the difference in the mean J.Q.’s is significant between junior high 
y friends in any hypo- 


Sch e 
thetie, girls who have few and those who have man : 
qe Population from which the 958 girls might be considered a ran- 
ample. 
nces. Just as the significance of the 


Dig 
di a oe Between Two Varia 
So Gast between two means, which 
1 he significance of the differeno 


he 5 
sult p, esearch i h an unusual situ 3 
Inc. Ro O. Th ae mee seal Methods in Research ew cee 
Coc} dy EE valë “ques—one by Behrens and another by 
i pr: CR er e et not be readily accessible. 


an 
and Cox. The original sources Më 


has been described, may be desired; 
e between two variances, OF stand- 
ell to con- 


ation would do w 


134 STATISTICAL METHODS 


ard deviations, may be nceded. The formula for testing the variances 
employs a variance ratio, as follows 
s 

Pa 
where sí is the variance in the sample group with the greater variance 
and s7 is the variance in the sample group with the lesser variance. The F 
is a new term coined by Snedecor to honor Fisher. Like t, it is a numerical 
expression of little meaning except in terms of obtaining probabilities from 
a suitably prepared table. Snedecor’s table of F is shown in the Appendix. 
Since explanation of the use of this table for models other than this one Ze 
described in the chapter, “Analysis of Variance—Single Classification,” 
the discussion here will include only the use of this table for evaluating F 
with this particular formula. 

The table of F has been prepared at the 5 per cent and 1 per cent 
levels for use whenever there is a logical reason for designating which of 
the two variances to place in the numerator and the denominator. Prep- 
aration of the P-table in this manner is to be expected, since nearly all 
uses of such a table so dictate. The use of F for the purpose of the formula 


e 


= $0 

Fe K 

does not involve designating the numerator and denominator by logical 
considerations, since the choice of the numerator is determined entirely 


10 per cent and 2 per cent levels. 

In the example concerning the L.Q.'s for girls with few and many girl 
friends, the variance was 101.03, with a resulting standard deviation of 
10.05, for the group of 137 girls with many girl friends, and the variance 
was 302.28, with a resulting standard deviation of 17.39, for the group of 
121 girls with few girl friends. To test the significance of the difference 
in variances, then 


The subscripts of F indicate the degrees of freedom, the first being the 
number of degrees of freedom associated with the numerator, whereas the 
second is the number of degrees of freedom associated with the denomina- 
tor. The number of degrees of freedom is 120, (kı — 1), for the numerator 
and 136, (k2 — 1), for the denominator. The table of F with 120 degrees 
of freedom indicated across the top of the page and 136 degrees of freedom 
indicated down the side reveals by interpolation, F = 1.34 at the 5 per 
cent level and 1.51 at the 1 per cent level. For the purpose of evaluating 
the variance ratio, then F = 1.34 at the 10 per cent level and F = 1.51 


STATISTICAL INFERENCE—TESTING HYPOTHESES 135 


at the 2 per cent level. The obtained F-value of 2.99 is significant since it 
exceeds either of the values found from the table. Girls vvith many girl 
friends are more uniform in LO. than are those with few girl friends. The 
example here chosen is far from typical of the ratios found in educational 
and psychological research, significant differences seldom occurring in 
the variance ratios except when the sample sizes are very large. 

The method described for testing the significance of the difference be- 
tween two means using separate group variance is quite satisfactory 
whenever each of the distributions can be considered as a single homo- 
gencous population, not requiring stratification into subpopulations. The- 
oretically, the statistical model requires that the two population variates 
be normally distributed. In actual practice, it has been found that this 
requirement of normality can be disregarded except in those situations 
in which one or both of the distributions departs radically from nor- 
mality. The separate group variance method cannot be extended into 
those situations in which stratification seems desirable. For this reason, 
this method has been largely replaced by the method in which the de- 
nominator in the ¢-test is found from pooled variance, a model which can 
be extended into the analysis of stratified samples. ; 

Difference Between Two Means—Pooled Variance. In the comparison 
of two groups, it is possible to test the null hypothesis that both sample 
groups represent a single homogeneous normally distributed population. 
The statistical model for this design may be expressed as 


at 


where s? is the within variance, i.e., the sum of squares has been calcu- 
lated by summing the squared deviations for each individual case from 
the mean of the group in which that case is found. Experience has shown 
that the necessity for a normally distributed population, although the- 
oretically demanded, may be disregarded except in a situation in which 
Ronnormality is extreme. h 

The foregoing formula yields a t-value which indicates, from a t-table, 
the probability that both sample groups could have resulted by random 
Sampling from a single homogeneous population. It therefore evaluates 
not only the difference between two means but also the difference between 
the two variances, It does represent a satisfactory test of the significance 
Of the difference between two means whenever the sample variances are 
actually estimates of single population variance. The assumption of a 
Single population variance is usually made whenever it is impossible to 
Prove at the 5 per cent or 1 per cent level, that such a single population 
Variance does not exist. Some extremists go so far as to suggest that this 
Statistical model not be employed whenever the estimates of population 


136 STATISTICAL METHODS 


variance from the sample groups are significantly different. Evidence 
accumulated over the past twenty years suggests that such a standard of 
rigidity, although theoretically justified, is unnecessary. Little violence 
accompanies the use of this statistical model and subsequent interpreta- 
tions in terms of the difference between two means unless the difference 
in variance estimates is extreme, a situation seldom encountered in edu- 
cational and psychological research. It is becoming evident that even 
large differences in the variance ratio have little effect upon the resulting 
t-values. 

The formula for ¢ using pooled variance, just shown, may be written 


Ja — DE — Xs)? 

“i N (22? + 223) 

This formula is convenient for solving by either machine calculation or 
by logarithms. The Za”s are added after they have been found in each 
sample group from the mean of the sample group in which each case 
appears. Actually these within sums of squares are computed from the 
more convenient of two formulas which are 


2 y 
Za? + Bx} = E = Bag | + [=x2 = | 
TË 2 


t 


or its mathematical identity 
2 2 
Bat + 208 = ox? — [FEF 4 ell 
1 2 


The number of degrees of freedom involved in evaluating a t-value 
from this formula is (kı + ka — 2) or (N — 2). This formula, as well as 
the one deseribed earlier in this chapter, is based upon independent sam- 
ples. If the foregoing formula should be used with correlated sample 
groups, the t-value will be an underestimation of significance, Suitable 
techniques for correlated samples will be described later in this chapter as 
well as in the chapters, “Analysis of Variance—Multiple Classification” 
and “Analysis of Covariance.” 

The data shown in Table 25 may be utilized for indicating the pro- 
cedure to be followed whenever the sample groups are equal in size. From 
this information, the t-value may be computed as follows: 


¿= , (CPDL — 2879 = 22.77 _ 4 79 
40(2,165.8 + 1,388.2) 


This t-value differs only in the third decimal place from the t-value ob- 
tained by using separate-group variances in the analysis. The essential 
difference, in this situation, is in the number of degrees of freedom to be 
consulted when entering a table of t. With the separate-group variance 
the number of degrees of freedom is 19, whereas with the pooled 


t-value, 


STATISTICAL INFERENCE—TESTING HYPOTHESES 137 


variance t-value the number is 38. The tabled values are, at the 5 per 
cent level, 2.093 and 2.023, at the 1 per cent level, 2.861 and 2.700 for 
19 and 38 degrees of freedom respectively. As the size of the sample 
increases the discrepancy between the t-values decreases. In fact, most 
actual research situations in education and psychology involve a sufficient 
number of cases so that differences in ¢-values are inconsequential, regard- 
less of the use of the separately considered or pooled variance method. 

The pooled variance t-test may also be made with unequal number of 
cases in the two sample groups, unless one sample group is extremely 
small and the other extremely large. In such extreme situations it is 
suggested that the larger group be considered as a population and the 
Possibility that the smaller sample group could have resulted from ran- 
dom sampling be evaluated by the methods suggested in the chapter, 
“Classical Theory of Sampling.” 

As an example of the application of the t-value obtained from an un- 
equal number of cases in the sample groups, the information shown in 
Table 26 will be used. Thus from the pooled variance assumption 


Ke I 137)(121)(258 — 2 (105.54 — 101.87)? — 9.105 
258(13,740.03 + 36,273.88) 


as compared with the previously computed t-value from separate-group 
variances of 2.04. The discrepancy between the two procedures is greater 
than usually found since the variance ratio is much larger than usually 
found in actual research situations. In fact, the authors have searched 
for this unusual example to indicate the small part which homogeneous 
Variance plays in a satisfactory test of the difference between two means. 

he assumptions of normality and homogeneous variance, necessary by 
theory to satisfy the application of the statistical model, may well be 
Waived except in the most extreme Cases. 

The method of obtaining a t-value by using pooled variance does have 
the advantage, as will be shown in later chapters, of its usefulness in 
extension into stratified groups. This method when extended is called the 
analysis of variance. The analysis of variance has become a powerful tool 
in testing significance and, to a large extent, has revolutionized research 
in all areas, including education and psychology, in which inferences are 
desired from observational data. 

Analysis of Variance—Two Groups. un 0 0 
for the comparison of two groups is mathematically identical with that 
Which employs pooled variance in arriving at a t-value. It is here included 
as a transitory procedure from testing significance between two means to 
More complex designs resulting from stratification. In most situations 
m education and psychology more accurate and more sensitive tests cf 
Significance may be had if the sample groups to be tested are stratified 
Upon one or more characteristics known to be related to the criterion 


he method of analysis of variance 


138 STATISTICAL METHODS 


which is to be evaluated. Later chapters will include several designs in 
which the analysis of variance is appropriate. For the present purpose, 
the design is limited to two sample groups such as have been previously 
described, i.e., no stratification pertinent to the analysis within the two 
sample groups. 

It is possible to assign in any given sample the source of variation 
with its contribution to the sum of squares associated with each. Thus, 
adults may be classified by sex and the heights of the individuals 
assembled. It is well known that individual differences exist both when 
contrasted with or without considering sex differences. The sources of 
variation may then be identified as those associated with individual dif- 
ferences from the mean of all persons included in the study or from the 
mean of the sex in which any given person is included. Thus, the sums 
of squares of individual differences in height may be broken into two 
sources, the sum of which is the total sum of squares. 

The typical analysis of variance table to test significance whenever 
two groups only are involved is: 


SOURCE DEGREES SUM 
op oF OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Groups 1 
Within Groups N—2 
Total N-1 


The sum of squares for total is the sum of the squared deviations com- 
puted from the general mean. The sum of squares for within groups is 
calculated from the group means. The sum of squares for groups can be 
had by subtraction or from 


Gu, CX) _ (BX, + 2X) 
Fa ka N 


TABLE 27. Analysis of Variance of Information Classified into Two Groups 


SOURCE DEGREES SUM 
OF OF OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Groups 1 270.4 270.4 
Within Groups 38 3554.0 93.5263 
Total 39 3824.4 


STATISTICAL INFERENCE—TESTING HYPOTHESES 139 


The additive property of the sums of squares may be explained by the 
information shown in Table 25. The usual presentation is shown in Table 
27. The sum of squares needed in the analysis may be obtained from the 
sums and sums of squares of the original scores which are: 


— — ——— 


GROUP x xX? 
With Materials 558 17,734 
Without Materials 454 11,694 
Both 1012 29,428 


a 
The sum of squares for total is found from the expression 


DX? — = : 


to be , 
29428 — aor = 3824.4 


This sum of squares may be obtained also by squaring the difference be- 
tween each of the 40 scores and the general mean of 25.3, i.e., the mean 
of all scores without regard to classification. In actual practice, no one 
would resort to obtaining the total sum of squares by the tedious pro- 
cedure of finding the deviations from the general mean, then squaring 
each deviation, and then summing for all cases. Regardless of the method 
used, this total sum of squares does represent, in a mathematical way, 
individual differences among students in achievement. If individual dif- 
ferences did not exist, the sum of squares for total would be zero. 
The sum of squares for group is found from the expression 


(2X1)? + CX) CX DN 
k N 


or in the situation indicated 
(558)? + (454)? _ (1012% _ 970.4 
20 40 


The magnitude of this sum of squares is an indication of the degree to 
Which the means in the sample groups differ. With identical means this 
Sum of squares is zero. 
The within sum of squares is usually found by subtraction but may be 
found by summing the squares of the deviation for all students away 
from the mean of the group in which the students have been classified. 
This within sum of squares may also be found from the expression 


2 DX)? 
em CAF em WC 


140 STATISTICAL METHODS 


or its equivalent 
EX? — hi a d E - 
[Ca 


'Thus the total sum of squares is made up of two mathematical magni- 
tudes from which it is possible to estimate population variance. These 
estimates of population variance are called mean squares and are found 
by dividing the sum of squares by the number of degrees of freedom 
associated with each. Whenever the ratio of the mean square for groups 
to that of within groups, representing as ib does the difference between 
the sample group means, becomes large, the possibility that the null 
hypothesis prevails becomes untenable. Whenever the significance of the 
difference is desired between two groups, a t-value may be obtained from 
the square root of this ratio, which in this situation is 


I 270.4 
eps 93.5263 ` H 


This t-value is, of course, identical with that found with the ¿-value ob- 
tained from the formula employing pooled variance, since the two pro- 
cedures are mathematically identical. 

The analysis of variance with two groups does not demand the same 
sample size in each group, although extreme differences in sample size, 
as previously pointed out, should be appraised with considerable skep- 
ticism. Situations such as shown in Table 26 in which the sample groups 
are unequal, but not extreme, are common in educational and psycho- 
logical research. The analysis is shown in Table 28. 


TABLE 28. Analysis of Variance with Two Groups of Unequal Size 


SOURCE DEGREES SUM 
op op op MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Groups 1 866.53 866.53 
Within Groups 256 50,013.91 194.98 
Total 257 50,880.44 


1866.53 
t= 194.98 “ 2.105 


The sum of squares for total is 


2 
2,831,641 — 29.785 — 50,880.44 


and for groups 


STATISTICAL INFERENCE—TESTING HYPOTHESES 141 


14,459)? 12,326)? 26,785)? 

E ar — age = 98888 
The difference between the total sum of squares and the group sum of 
squares is the within sum of squares. The ë-value, obtained by extracting 
the square root of the ratio of the mean square for groups to that within 
groups, is 2.105, which is identical with that obtained by the ¿-formula 
using pooled variances. 

Difference Between Two Means—Correlated Samples. There are many 
research situations in education and psychology which involve correlated 
samples. Correlated samples are those in which a member of one group 
tends to be like a member of the other group. The members of the two 
sample groups may then be paired on the basis of one or more charac- 
teristies pertinent to the eriterion about which the sample groups are to 
be compared. 

For an example of such pairing, the information shown in Table 25 
concerning the sample groups with and without supplementary mimeo- 
graphed materials will be used. The methods heretofore described have 
assumed that no correlation existed, which actually was not true, The 
sample groups were graduate students in an educational course during a 
summer session. The students were paired on the basis of employment 
during the previous academic year—a superintendent of schools with a 
superintendent of schools, a college teacher with a college teacher, a high 
school principal with a high school principal, a high school science teacher 
with a high school science teacher, no teaching experience with no teach- 
ing experience, and so forth. 

This correlated design may b 
between two means by the formula 


e evaluated for a significant difference 


ae ae 
N(N — 1) 
Where Sd? = 3D? — Ez The N is the number of pairs and D is the dif- 
ference between the scores for each pair. The expression Xi — X; is equiv- 
alent ti =D 
“y 


In the example shown in Table 25, 
=D = (87 — 36) + (18 — 16) + «++ + (21 — 16) = 104 
DD? = (37 — 36)? + (18 — 16) + +=: + (21 — 16)? = 2,244 
4)2 
za = 2,044 LOL = 1,703.2 


a 104 _ 
AH. 


142 STATISTICAL METHODS 
Substituting into the formula 


t= Bees = 2.46 
cta 
d (20) (19) 
The number of degrees of freedom for evaluating this £ is 19, one less 
than the number of pairs. Since the value of 2.46 is greater than the 
tabled value at the 5 per cent level, the null hypothesis is untenable. 
The usefulness of the mimeographed materials has been demonstrated 
when evaluated in terms of the test scores used. It should be noted that 
when pairing was disregarded the t-value of 1.70 was nonsignificant. The 
advantage of pairing in providing a more sensitive test of significance 
depends upon the degree to which the pairing characteristics are related 
to the criterion, or as usually stated, the effectiveness of the controls. 

The method just shown for samples in which the individuals are paired 
does not directly indicate the effectiveness of the controls, although some 
clue may be had from a comparison of the ¢-values with and without 
control. Another method for the analysis of this design is shown in the 
chapter, “Analysis of Variance—Multiple Classification.” The two meth- 
ods are mathematically identical as far as testing the significance between 
two means, but the latter method in addition permits an evaluation of 
the effectiveness of the controls which are used. 

The technique of pairing may be done by pairing the same individuals 
or different individuals. In some research situations the design of pairing 
an individual with himself leaves little to be desired. Many studies, how- 
ever, must of necessity be based upon independent samples. The problem 
then becomes one of noting the characteristics to be controlled. If the 
characteristics are variable characteristics, the techniques shown in the 
chapter dealing with covariance are recommended, although many re- 
search workers will pair individuals on such characteristics. For example, 
it is not unusual to find a study in which pupils are paired on the basis 
of LO. when evaluated on some evidence of academic achievement. Al- 
though the method of covariance analysis is preferred in such situations, 
the method just shown for ¢ in correlated samples will yield similar in- 
terpretations when large numbers of cases are involved. The t-analysis 
just described is particularly useful for controlling on nonvariable char- 
acteristics for which there are insufficient numbers of cases of one or more 
types to permit stratification. 


Exercises 


1. Why is the null form of a hypothesis usually considered the most appro- 
ate in a test of significance for a difference between two statistics? 
2. Describe in one sentence the fundamental purpose of the t-test. 
3. If the difference between two sample means has been shown to be nonsig- 


pri 


STATISTICAL INFERENCE—TESTING HYPOTHESES 143 


nificant, are you justified in saying that the two samples very probably come 
from the same population? Why? 
4. After testing a null hypothesis by means of a t-test in which a nonsignifi- 
cant t-value was found, an investigator considered two possible interpretations: 
“The null hypothesis is accepted.” 
“Insufficient evidence has been found to reject the null hypothesis.” 
Which of the two is to be preferred? Why? 
| 5. Is it possible that a significant t-value resulting from the test of signifi- 
cance of the difference between two sample means, could have resulted from a 
Cat difference between two characteristics other than the sample means? 
xplain. 
6. Suppose that the level of significance required for rejection of the null 
hypothesis were considered to be 1 per cent rather than 5 per cent. 
a. Would this change increase or decrease the likelihood of rejecting a null 
hypothesis when it is actually true? 
b. Would this change increase or decrease the likelihood of retaining the 
null hypothesis when it is actually false? 
c. On the basis of your answers to the foregoing questions, would you be 
willing to argue that the 1 per cent level is superior to the 5 per cent 
| level for the purpose stated? Why? 
7. Evaluate the following statement: “A test of significance cannot prove the 
truth or falsity of the null hypothesis; it can only demonstrate its improbability.” 
8. A sample of 100 girls who were 59 months of age and who had spent their 
entire lives in an orphanage were administered a form board test. For comparison 
purposes, a sample of 100 girls of the same age who had been raised in private 
homes were given the same test. The results were tabulated in terms of seconds 
required to place the blocks in the proper holes, and were as follows: 


GROUP k EX zX? 
| Orphanage 100 7,609 592,833 
| Private Home 100 8,128 647,694 


a. State the null hypothesis which can be tested by these data. 
b. Compute a t-test and interpret the resulting t-value. 

9. An admissions officer of a large university wished to determine whether 
male freshmen enrolling in social science curricula differed from male freshmen 
enrolling in agricultural curricula with respect to linguistic scores on a certain 
scholastic aptitude test. The following data were gathered from one entering class 
of freshmen. 


GROUP k ZE IX: 
Social Science Majors 198 23,721 2,685,367 
591 67,991 8,266,065 


Agricultural Majors 


a. Identify a hypothetical population of which the foregoing data may be 


considered a sample. 
b. State the null hypothesis which could be tested by these data. 


c. Compute t and interpret. 


144 STATISTICAL METHODS 


10. A group of 40 over-achievers and 40 under-achievers were identified among 
the members of a senior class of an eastern high school. Over-achievers were 
defined as those whose academic achievement exceeded that expected on the basis 
of a scholastic aptitude test. Under-achievers failed to achieve in school as well 
as expected on the basis of the same test. Both groups were identified by regres- 
sion techniques. Reading speed scores (X) for both groups are summarized below. 


GROUP k EX EX? 
Under-achievers 40 4,917 e. 602,193 
Over-achievers 40 5,473 771,021 


a. State the null hypothesis comparing the two groups with respect to 
reading speed scores. 
b. Test the hypothesis by means of a t-test. 
11. Thirty freshman students of a general psychology class were given a 
true-false test at the first meeting of the class. The test was composed of state- 


STUDENT PRETEST SCORE FINAL TEST SCORE 
1 57 73 
2 97 113 
3 68 92 
4 94 102 
5 84 113 
6 87 101 
7 66 81 
8 84 99 
9 79 103 

10 75 109 
1 79 110 
12 60 70 
13 78 105 
14 121 150 
15 85 111 
16 80 98 
17 102 132 
18 73 91 
19 86 112 
20 99 123 
21 83 96 
22 88 119 
23 122 153 
24 77 91 
25 80 104 
26 72 103 
27 76 113 
28 82 106 
29 69 88 


STATISTICAL INFERENCE—TESTING HYPOTHESES 145 


ments which seemingly had a scientific basis in fact, yet which were, for the most 
part, nothing more than restatements of widespread superstitions. At the con- 
clusion of the semester the test was again administered to the class. The results 
are tabulated below in terms of the number of statements correctly identified 

as scientifically accurate or inaccurate. 
a. The instructor wishes to know whether the students improved their 
scores significantly by the end of the semester. State this hypothesis in 


null form. s 
b. Test the null hypothesis by means of a t-test and interpret your 


answer. 


9 


Chi Square 


In many investigations in education and psychology the elements or 
cases in a sample can be classified into categories representing one or 
more characteristics. The classification process consists essentially of 
identifying the characteristics of each individual in the sample and then 
enumerating the individuals having been classified into each category. 

After the classification has been accomplished, the research worker may 
then be interested in evaluating the discrepancy between the number of 
sample cases actually observed in each category and the number of 
sample cases which would be expected in each category on the basis 
of some hypothesis about the distribution of cases in the population. Chi 
square is a statistical technique which enables the investigator to eval- 
uate the probability of obtaining differences between the actual and ex- 
pected frequencies in the categories of one or more classifications as a 
result of sampling fluctuation. 

A treatment of chi square can be subdivided on the basis of the man- 
ner in which the expected frequencies in each category are obtained. In 
some instances, the expected frequencies for a particular classification 
can be obtained from such sources of information as census data, or from 
rational hypotheses. On other occasions, no satisfactory postulate about 
the population can be made, and the investigator is forced to resort to the 
information in the sample itself to obtain the expected frequencies. 


THE USE OF CHI SQUARE IN ESTIMATION 


Through the use of chi square an investigator can evaluate the prob- 
ability that any sample distribution differs from the distribution of cen- 
sus, or other rational information, by more than would be expected from 
sampling fluctuation. Many investigations are based upon samples from 
populations about which considerable information is available. It is often 
possible to postulate the frequencies to be expected in a class on the basis 
of this population information. For example, an investigator was inter- 
ested in the sex ratio of graduating seniors of public high schools in a 

146 


CHI SQUARE 147 


midwestern state. To obtain evidence about the ratio, a random sample 
of 2508 students was selected. The sample was found to include 1210 
boys and 1298 girls. 

To determine whether graduation tendency was a sex characteristic, 
it was considered desirable to contrast the sample evidence with informa- 
tion about the population from which the sample was obtained. It was 
assumed on the basis of available census data regarding the sex ratio of 
the population of individuals of twelfth-grade age in the state, that the 
ratio was 51 males to 49 females; or expressed as percentages the pop- 
ulation was 51 per cent male and 49 per cent female. 

If high school graduation were not a sex characteristic, the expected 
number of boys to be found in a sample of 2508 cases according to the 
population information would be 


(0.51) (2508) = 1279 
whereas the expected number of girls would be 
(0.49) (2508) = 1229 


Both the actual and expected frequencies for this sample are shown in 
Table 29. 


Taste 29. Actual and Expected Frequencies of Boys 
and Girls Graduating from Public High Schools 


ee 


SEX ACTUAL EXPECTED 
Boys 1210 1279 
Girls 1298 1229 
Total 2508 2508 


Inspection of these data indicates that the actual and expected fre- 
quencies differ in size. It is the size of these differences which provides 
evidence for evaluating the probability that obtained differences of any 
Size might have resulted from sampling fluctuation. Obviously, the 
greater the differencës the less the probability that they might have re- 
sulted simply from chance variation from the population distribution. 

Chi square, in effect, determines whether the sample frequencies in a 
classification such as the frequencies in the foregoing problem are sig- 
Nificantly different from those which would result if only chance factors 
were operating. Chi square is computed from the formula 


SE (Actual Frequency — Expected Pore?) 
e z [ Expected Frequency 


Thus, when the data in Table 29 are substituted, 


148 STATISTICAL METHODS 


_ (1210 — 1279)? , (1298 — 1229)? _ 
SS eo i Ee 

To evaluate the probability that a chi square of a given size is within 
the expectation of chance fluctuation, it is necessary to consider the 
number of categories involved in the classification. It has already been 
noted in the formula for chi square that a fraction of the total value is 
contributed by each category, so the size of the chi square will increase 
as the number of categories increases. To allow for this variation in the 
number of categories, tables of chi square based upon various degrees of 
freedom have been determined. 

Whenever the only restriction placed upon the computation of the ex- 
pected frequencies in a single classification is that their total be the same 
as the observed frequency total, the number of degrees of freedom is one 
less than the number of categories in the classification. 

In the foregoing example in which the expected frequency total for 
the two categories has been forced to agree with the sample total, the 
number of degrees of freedom is one. It may be noted that since the ex- 
pected frequency total is fixed, establishing the expected frequency of 
boys as 1279 automatically determines the expected frequency of girls 
as 1229. Only one expected category need be filled in this case, for the 
other can be determined by subtraction. Determining the number of cate- 
gories which need to be filled through computation based on the popula- 
tion information is another way of describing the degrees of freedom for 
problems of this type. 

To evaluate the probability of obtaining a chi-square value of 7.596 
with one degree of freedom, the table of chi square shown in the Ap- 
pendix is consulted. It is shown in the table that a value as large as 7.596 
would be expected in less than 1 per cent of random samples of the same 
size drawn from the population if there were no sex bias in graduation 
tendency. Evidence clearly indicates that more girls and fewer boys were 
graduated from public high schools of this state than would be expected 
from the postulated 49 to 51 sex ratio in the population. 

If the entries in a chi-square table are compared with the entries in a 
t-table, it can be seen that the square root of chi square, or chi, with one 
degree of freedom corresponds to values of ¢ with infinite degrees of free- 
dom. It is therefore possible to consult a table of ¢ with infinite degrees 
of freedom to evaluate the significance of a chi-value which has been 
derived from a chi-square value with one degree of freedom. This rela- 
tionship is not true, however, when the chi-square value has more than 
one degree of freedom. 

Another example of the application of chi square can be found in 
studies of college freshman enrollment figures. For example, registrars 
are sometimes interested in the nature of the hish schools from which 


CHI SQUARE 149 


the enrolling freshmen graduated. A particular college may wish to know 
whether the tendency to matriculate at that college the September fol- 
lowing high school graduation is a function of the type of high school 
organization from which the student was graduated. Let it be supposed 
that the college is concerned only with freshmen who were graduated 
from high schools in the state in which the college is located and that the 
freshman enrollment is distributed as shown in Table 30. The expected 


Tate 30. Actual and Expected Frequencies of College Freshmen 
Graduated from Various High School Types 


Óéáú[¡ú(H/¡€>2>—_———_———__—___— TE 


TYPE OF ORGANIZATION PER CENT OF TOTAL NO. OF FRESHMEN 
OF HIGH SCHOOL JUNE GRADUATES ACTUAL EXPECTED 
8—4 26 62 49.92 
6—6 23 48 44.16 
6—3—3 31 53 59.52 
6—2—4 20 29 38.40 
A ee eS > 
Total 100 192 192.00 


frequencies are obtained from the total number of June graduates from 
each type of high school in that state. If matriculation tendency were 
not a function of the type of high school, the 8—4 type high school which 
graduated 26 per cent of the total number of June high school grad- 
uates should contribute (0.26) (192) or approximately 50 freshmen to the 
September class. In the same manner, the 6—6 type high school which 
graduated 23 per cent of the total in June should contribute (0.23) (192) 
or slightly more than 44. The 6—3— type and the 6—2—4 type which 
graduated 31 per cent and 20 per cent respectively would contribute about 
60 and 38 freshmen respectively. 

Since the total of the expected frequencies and the total of the actual 
frequencies have been forced to agree, the number of degrees of freedom 
is three, one less than the number of categories in which enrollment figures 
have been classified. With the proper substitution of the data in Table 30, 
xë = (62 — 49.92)? A (48 — 44.16)? , (58 = 59.52)? , (29 — 38.40)? 

49.92 44.16 59.52 38.40 

= 0,272 

This value with three degrees of freedom is larger than the 10 per cent 


value given in the chi-square table but considerably smaller than the 
5 per cent value. Since the 5 per cent level has not been reached, evidence 
is insufficient to indicate that the tendency to matriculate at this par- 
ticular college the September following graduation from high school is 
Telated to the type of high school from which the freshmen were grad- 


uated. Stated otherwise, it has not been possible to demonstrate that 


150 STATISTICAL METHODS 


variations among different types of high school in matriculation at this 
college are greater than those attributable to sampling fluctuation. 

Still another example of the application of chi square to problems in 
which postulated expected frequencies can be used is an outgrowth of the 
analysis of results from consumer opinion panels. Let it be assumed that 
a panel of 400 housewives has been asked to test and then give their 
preferences regarding two brands of coffee, Brand X and Brand Y. An 
appropriate analysis is to determine whether there is no greater difference 
between preferences in the sample than would be expected from random 
sampling from a population in which opinions were equally divided. If 
population preferences for each brand were equally distributed, then the 
expected frequencies for any random sample would be 50 per cent of the 
panel, (0.50) (400) which is 200. The actual and expected frequencies 
are shown in Table 31. 


TABLE 31. Actual and Expected Frequencies of 
Panel Preferences for Two Brands of Coffee 


FREQUENCY 
PREFERENCE ACTUAL EXPECTED 
Brand X 216 200 
Brand Y 184 200 
Total 400 400 


Substituting into the formula for chi square 


2 — (216 — 200)? , (184 — 200)? 
= 200 T 200 = 2.560 


Since this is a single classification and the only restriction placed upon 
the computation of chi square has been that the expected frequency total 
equal the actual frequency total, the number of degrees of freedom is one 
less than the number of categories, or one. Consulting the table of chi 
square with one degree of freedom it is found that the value of 2.560 
fails to reach the 5 per cent level of significance. Evidence is insufficient 
to indicate that there are more widely divergent preferences for these 
brands than are attributable to sampling fluctuation, since more than 
11 per cent of random samples of this size would be expected to yield 
differences as great or greater than the differences obtained here if there 
were no population differences between the brand preferences. 


THE USE OF CHI SQUARE IN TESTING 
HYPOTHESES 


Frequently an investigator is interested in the number of elements or 


cases in a sample which have been grouped simultaneously according to 


CHI SQUARE 151 


two classifications. As indicated in an earlier chapter, the tabular pres- 
entation of a two-way classification results in what is usually called a 
contingency table. The cells of a contingency table contain the number 
of cases which reflect characteristics involved in the two classifications, 
one classification being noted in the stub of the table and the other in the 
table heading. Cells arranged horizontally are designated as rows and 
cells arranged vertically are designated as columns. The size of con- 
tingency tables is described by indicating either the number of rows and 
the number of columns in the table, or the product of these, which is the 
total number of cells in the table. For example, a contingency table hav- 
ing two rows and two columns might be referred to as a 2 x 2 table or a 
four-cell table. 

When expected frequencies cannot be obtained from any other source 
of information, the row and column totals of a contingency table may be 
used to compute the expected frequencies. It should be remembered that, 
whenever possible, postulated expected frequencies are used. Many times, 
however, the necessary information is not available and the expected 
frequencies must be determined from information in the sample itself. 

Four-Cell Contingency Tables. Chi square is used with contingency 
tables to evaluate the probability that discrepancies as great or greater 
than those shown in a table might have resulted from sampling fluctua- 
tion. For example, a sample survey may suggest that two cities differ in 
the percentages of one-family residences owned and rented. In the hypo- 
thetical example shown in Table 32, a sample of 200 one-family homes 


dann 32. Ownership of One-Family Homes in Two Cities 


a SS 


RENTED 


A RS EA 
CITY k PER CENT k PER CENT 
110 55 90 45 
B 80 40 120 60 


has been taken at random from each of two cities and 55 per cent and 
40 per cent of the homes found to be owned by the occupants. The ob- 
vious inference is that ownership prevails more often in City A than in 
City B. Yet the actual situation, if known, might not bear out the infer- 
ence drawn from the examination of the sample. It is possible that no 
differences prevail other than those which might be expected from fluc- 
tuations of random sampling. A z 

In computing chi square, the actual frequencies are utilized and the 
percentages disregarded, even though it may be desirable to include per- 
centages in the final research report. Table 33 has been prepared from 
Table 32 by eliminating the percentages and substituting in their place 
expected frequencies. These expected frequencies are determined so that 


152 STATISTICAL METHODS 


TABLE 33. Actual and Expected Frequencies of One-Family Home Owners 
in Two Cities 


OWNERS RENTERS 

CITY ACTUAL EXPECTED ACTUAL EXPECTED TOTAL 
A 110 95 90 105 200 
B 80 95 120 105 200 
Total 190 190 210 210 400 


no differences in ownership or renter-status appear in the table. In the 
two cities combined, the frequencies of owners and renters are in the 
ratio of 190 to 210. Thus, if no relationship exists, the distribution would f 
be 95 owners and 105 renters in each city. A discrepancy of 15 exists in 
each cell of the table between actual and expected frequencies. Chi 
square is computed from the formula 


z E Frequency — Expected EEN 
~ = 


Expected Frequency 


thus, in the example, 


(110 — 95)? , (80 — 95)? , (90 — 105)? , (120 — 105)? ` 
un a. * at ' oe eon 


To evaluate the probability that a chi-square value might arise from 
chance, the number of degrees of freedom involved in the computation 
must be considered. Obviously, for any contingency table the greater the 
number of cells in the table the larger will be the corresponding value of 
chi square. The degrees of freedom for chi-square values based upon row 
and column totals in a contingency table depend upon the number of 
cells in the table which can be filled with expected frequencies without 
disturbing the row and column totals of the table, rather than on the 
total number of cells in the table. In the example shown, if no informa- 
tion other than the row and column totals appearing in Table 33 is 
known, then the expected frequency of owners in City A would be 


Total Sample City A | [ 1 ] 
a Sample Both Cities Total Omens 


or Z of 190 which equals 95. If 95 is inserted as an expected frequency 


for owners in City A, then all other expected frequencies in the table will 
automatically be determined, since the totals of both rows and columns 
are unchanged. Therefore, only a single cell in a 2 x 2 table need be filled, 
and the frequencies for the other cells may be obtained by subtractions 
from row and column totals. In a 2 x 2 table the number of degrees of 
freedom is one whenever the expected frequencies are obtained from the 


CHI SQUARE 153 


row and column totals. Consulting the table of chi square it is found 
that with one degree of freedom a value of 9.023 or more would be ex- 
pected in less than 1 per cent of samples if there were no differences in 
home ownership between cities. Evidence has been found that ownership 
of one-family residences is more prevalent in City A than City B, or, 
conversely, that renting of one-family residences is more prevalent in 
City B than City A. 


TABLE 34. Symbolical Designation of Cells and 
Totals of a Four-Cell Contingency Table 


STUB ITEMS HEADINGS TOTAL 
a b a+b 
€ d c+d 

Total a+c b+d N 


When a four-cell contingency table is being considered, the calculation 
of chi square can be simplified. If a, 6, c, and d are used to designate the 
frequencies in the four cells of the table, the four border totals can be 
represented as in Table 34. The chi-square value can be found by the 
formula: 

e N(ad — be)? 
X = Qa FDIC + Oat ob ta) 


After substitution of the data in Table 33 


2 (400)[(110)(120) — (90) (80) d 
X = Gio F 90) (80 + 120) (110 + 80) (90 + 120) 9.028 


The same value has been found without computing the expected fre- 
quencies as such. 

In making surveys it is unusual to find data which can be as easily 
manipulated as those shown in Table 32. More often ownership tables are 
difficult to treat arithmetically, even though the principle is the same. 
For example, a survey of rural homes revealed ownership data shown 
in Table 35, when only those families with children 10 years old or older 


Tain 35. Ownership Among 4-H and Non-4-H Families 
Ess 
OWNER TENANT 
FAMILY k PER CENT k PER CENT 


4-H 75 54.35 63 45.65 
Non-4-H 122 52.36 111 47.64 


154 STATISTICAL METHODS 


were considered. From this table it can be seen that ownership is slightly 

more prevalent among 4-H families. To evaluate the probability that such y 
a discrepancy might have resulted from sampling fluctuation, Table 36 

may be prepared, although it would probably be unnecessary to include 

it in a final research report. 


TABLE 36. Actual and Expected Frequencies of Owners Among 4-H and 
Non-4-H Families 


OWNER TENANT 

FAMILY ACTUAL EXPECTED ACTUAL EXPECTED TOTAL 
4H 75 73.28 63 64.72 138 
Non4-H 122 123.72 111 109.28 233 
Total 197 197.00 174 174.00 371 


The expected frequencies are computed as usual. The expected frequency 
of owners among 4-H families can be found by taking = of 197 which 


equals 73.28. The remaining expected frequencies may be obtained in a 
similar fashion, or they can be obtained by subtraction from the row and 
column totals. Chi square will be 
(75 — 73.28)? $ (63 — 64.72)? , (122 — 123.72)? a (111 — 109.28)? 
73.28 64.72 + 123.72 109.28 
= 0.137 


When the table of chi square is consulted it can be seen that with one 
degree of freedom a chi square of 0.137 or larger will be expected in more 
than 70 per cent of samples, even though no population differences actu- 
ally exist among 4-H and non-4-H families in regard to farm ownership. 

Small-Cell Frequency in Four-Cell Tables. The chi-square technique, 
useful though it may be, is not satisfactory when the sample is small, 
particularly whenever the expected frequency in any cell in a 2 x 2 table 
is less than 5. However, it is possible to obtain satisfactory inferences in 
many cases by using an adaptation proposed by Yates! which consists 
of adding 1⁄2 case to the smallest frequency of the 2 x 2 table and adjust- 
ing all other frequencies so that the row and column totals will remain 
the same. Unfortunately this correction is applicable only to four-cell 


tables. E S ` 
An example of small frequencies in a 2 x 2 is shown in Table 37A in 


which it is desired to know whether the ability to pass a particular course 
is a sex characteristic. In Table 37B one-half a case has been added to 
the frequency of females failing the course and the other frequencies ad- 


i tes, “Contingency Tables Involving Small Numbers and the x* Test,” 
S a eek the Journal of the Royal Statistical Society, 1:217-235, 1934. 
u 


CHI SQUARE 155 


justed accordingly so that the row and column totals are unchanged. 
Chi square is then computed from the adjusted and expected data as 
follows: 


e= | OSE, (OSH, 0.234 


No sex difference in ability to pass the course has been demonstrated 
from the available data. 

Careful research workers hesitate to test hypotheses with so few cases 
as are shown in Table 37. With such a small number of cases it is 
extremely difficult to demonstrate significant departure from the null 
hypothesis even though departures from expected frequencies are pro- 
portionately quite extreme. It is interesting to note that if the sample 
size in Table 37 were increased 10 times and such increase were propor- 
tionately distributed among the cells of the table, the chi square would 


TABLE 37. Sex of Passing and Failing Students 


A. Actual Data B. Adjusted Data C. Expected Data 
E 3 E Pej 
B 8 ë H B el E H Q 
Ë 2 8 8 1 = SË S E 8 Bee 
Ss BE BS 
Male 15 5 20 Male 15.5 45 20 Male 16 4 20 
Female 9 1 10 Female 8.5 15 10 Female 8 2 10 
-_. x-á_i —_AAáá—á—£%£É 
Total 24 6 30 Total 240 60 30 Total 24 6 30 


be 9.375, a value far in excess of that which is required for refuting the 
null hypothesis. q A 

Multiple-Cell Contingency Tables. Testing hypotheses with the chi- 
Square technique is not limited to 2 x 2 contingency tables, but may 
also be applied to multiple-cell tables. The principles of computation 
are similar, the chief difference arising from the number of degrees of 
freedom. The number of degrees of freedom is determined by the number 
of cells which may be filled with expected frequencies until the remaining 
Cells can be filled by subtraction from given row and column totals. From 
Table 38 it should be evident that in a 2 x 3 table, two cells in either row 
can be filled with expected frequencies and the others are automatically 
fixed by the totals. Thus, there are two degrees of freedom in a 2 x 3 
table. In a similar fashion, there are four degrees of freedom in a 3 x 3 
table and five degrees of freedom in a 2 x 6 table. The number of degrees 
of freedom in any contingency table may be found by (r —1) (c— 1) 
where “r” — number of rows and “c” = number of columns. This rela- 
tionship is always true whenever the most likely expected frequencies 
Are obtained from row and column totals. 

An example of the use of chi square with multiple-cell frequency tables 


156 STATISTICAL METHODS 
TABLE 38. Multiple-Cell Contingency Tables. 


60 40 60 
80 60 80 
100 80 20 30 26 24 18 22 

65 165 50 60 70 


is shown in Table 39, in which the state and out-of-state residences of 
10,589 students enrolled in summer schools of a midwestern state are 
indicated. The problem immediately becomes one of finding whether 
different institutions for some reason attract different ratios of state to 


TABLE 39. Summer School Attendance at State Institutions 
in a Midwestern State 


ACTUAL EXPECTED 
INSTITUTION STATE OUT-OF-STATE TOTAL STATE OUT-OF-STATE 
College A 1515 1131 2646 2069 577 
College B 1398 630 2028 1585 443 
College C 1560 64 1624 1270 354 
College D 3806 485 4291 3355 936 
Total 8279 2310 10589 8279 2310 


out-of-state students. The null hypothesis is postulated, i.e., these col- 
leges present no differences in ratio of state to out-of-state enrollment 
which might not be expected from sampling fluctuation. For convenience, 
the expected frequencies are shown on the right in Table 39. The usual 
computation for chi square is as follows: 


(1515 — 2069)? TË (1131 — 577)? arë (1398 — 1585)? ie (630 — 443)? 
577 1585 443 


X= 2069 
4, (1560 — 1270)* , (64 — 354? , (3806 — 3355) 
1270 354 


3355 
ay 2 
+ 485 Së = 1362.982 


In evaluating this value of chi square, there are three degrees of freedom 
since the product of one less than the number of rows and one less than 
the number of columns is three. The computed value of chi square, 
1362.982, is far beyond the value of 7.815 which is required for signifi- 
cance with three degrees of freedom. The hypothesis that 8279 state and 
2310 out-of-state students would be distributed among the colleges at 
random is not tenable. The null hypothesis has been disproven and some 


CHI SQUARE 157 


other factor or factors must furnish the clues for the variation among 
colleges in the ratios of state to out-of-state students. 

Since the chi-square value is significant the investigator may wish 
to identify the college or colleges whose actual summer school enrollments 
have deviated greatly from the expected enrollments and thereby con- 
tributed appreciably to the size of the chi-square value. Inspection of the 
data in Table 39 indicates that Colleges A, C, and D have enrollments 
which vary considerably from expected enrollments. Of the three, Col- 
lege A differs the most. This observation suggests the null hypothesis: 
there is no difference between College A and the other state colleges in 
the ratios of state and out-of-state summer school enrollment. Tech- 
nically, to test this hypothesis a new sample should be drawn. However, 
tentative inferences can be obtained from the data already available. 
The data in Table 39 can be regrouped as shown in Table 40. 


Tasun 40. Summer School Attendance of College A 
and Other State Institutions 
SE 
EXPECTED 


ACTUAL 
INSTITUTION STATE OUT-OF-STATE TOTAL STATE OUT-OF-STATE 
College A 1515 1131 2646 2069 577 
All others 6764 1179 7943 6210 1733 
Total 8279 2310 10589 8279 2310 


( ) (6764 — 2 (1179 — 1733)? 

_ (1515 — 2069)? , (1131 — 577)? , (6764 — 6210)? , (117 

Cep 2069 ps 577 28 6210 + 1733 
= 906.781 


This chi-square value with one degree of freedom is highly significant. 
The null hypothesis can then be tentatively rejected. This procedure could 
be repeated for any other college or colleges. However, it should be 
remembered that it is meaningful only when the chi-square value com- 
puted from the original problem is found to be significant. Ze 
Small-Cell Frequency in Multiple-Cell Tables. In some investigations 
certain cells in a multiple-cell contingency table will contain a small 
number of cases. Chi square is not satisfactory in such instances. Pre- 
hi square should not be employed whenever 
any expected cell frequency falls below five. In many situations this 
limiting nature of chi square may be overcome by combining categories 
With small frequencies or by eliminating the row or column containing 
expected cell frequencies of less than five. Again, it should be noted that 
the Yates correction, so useful jn a 2 x 2 table, is not applicable to a 


multiple-cell table. 


vailing practices suggest that ¢ 


158 STATISTICAL METHODS 


'The data of fiction preferences of high school boys enrolled in different 
curricula shown in Table 41 represent an example of small expected fre- 
quencies in a multiple cell contingency table. Since the expected fre- 


TABLE 41. Fiction Preferences and Curricula of High School Boys 


HIGH SCHOOL CURRICULUM 


FICTION COLLEGE 
PREFERENCE PREPARATORY GENERAL VOCATIONAL TOTAL 
Historical 13 3 ik 17 
Adventure 20 38 26 84 
Romance 22 26 19 67 
Mystery 21 20 30 71 
Western 20 38 46 104 
Total 96 125 122 343 


quency in one of the cells is less than five, chi square is inappropriate 
unless historical preferences are eliminated or combined with another 
type of preference. 

In Table 42, actual and expected frequencies are shown with historical 
preference eliminated and combined with adventure. Whenever a com- 


TABLE 42. Actual and Expected Frequencies with Historical Eliminated 
and with Historical Combined with Adventure 


ACTUAL FREQUENCIES EXPECTED FREQUENCIES 

FICTION COLLEGE GEN- VOCA- COLLEGE GEN- VOCA- 

PREFERENCE PREP. ERAL TIONAL TOTAL PREP. ERAL TIONAL 
Adventure 20 38 26 84 21.39 31.43 31.18 
Romance 22 26 19 67 17.06 25.07 24.87 
Mystery 21 20 30 71 18.08 26.57 26.35 
Western 20 38 46 104 26.47 38.93 38.60 
Total 83 122 121 326 83.00 122.00 121.00 
Historical- 

Adventure 33 41 27 101 28.27 36.81 35.92 
Romance 22 26 19 67 18.75 242 2383 
Mystery 21 20 30 71 19.87 2587 2526 
Western 20 38 46 104 29.11 37.90 36.99 
Total 96 125 122 343 96.00 125.00 122.00 


ed 


bination of rows or columns is made in this manner, the investigator must 
decide which of the possible combinations is the most satisfactory. If it 
can be assumed that the combination of historical and adventure is appro- 


priate, then chi square can be computed as usual. The investigator rather 
? 


CHI SQUARE 159 


than the statistician is usually better equipped to make such a decision, 
hovvever. 

As can be seen from Table 42, the number of degrees of freedom for the 
chi square computed from elimination or from combination will be the 
same, in this example, six. When the chi-square value for elimination is 
computed by substituting the appropriate data into the formula, chi 
square is found to be 10.799. This value with six degrees of freedom is 
found to be smaller than the value of 12.592 required for significance. 
However, when the chi-square value is computed for combination, it is 
found to be 12.460—a value very close to that required for significance. 

In most research studies the conclusion will not be altered by the 
decision to either eliminate or to combine. The possibility that a different 
conclusion could be reached through each method suggests that the 
investigator should compute chi square under both circumstances before 
the decision as to whether to eliminate or to combine is made. If the 
conclusion reached is the same, the decision will rest on the logic of the 
appropriateness of combination. Whenever the conclusions differ, and 
the combination is defensible, then the investigator should probably re- 
port the results of both analyses. 

In many tables a miscellaneous or a no-data category is included in the 
classification, The choice of including or omitting this category from the 
statistical analysis must be a consideration of good judgment. In some 
situations, the failure to include such a category in the computation 
is more satisfactory, whereas in other situations, the reverse is true. If the 
investigator is in doubt he will probably compute chi square in each 
instance, When both results yield the same interpretation, as will usually 
be the case, the safest practice is to so report. On the other hand, whenever 
there is a discrepancy in the interpretation, the investigator is faced 
with the decision concerning which is the more important interpretation. 

Direct Solution of Chi Square in Multiple-Cell Tables. A method for the 
solution of chi square in multiple-cell contingency tables which affords 
considerable savings in time has been proposed by Leslie) Essentially he 
has changed the usual 92-formula from 

në vs E RE 
N 
in which fis the frequency in any cell and T, and T, are the row and 
Column totals appropriate for that cell, to its mathematical identity? 
*P. H. Leslie, “The Calculation of Chi Square for an r x c Contingency Table,” 


Biometrics, 7, pp. 283-286, 1951. d 
The usual formula upon expansion may be written 


Pro) rë 


(Footnote continued on next page.) 


160 STATISTICAL METHODS 


PD 


To illustrate the use of the Leslie modification, the distribution of re- 
sponses of rural and urban boys to an item on a personality questionnaire 
will be used. The observed frequencies are shown in Table 43. 


TABLE 43. Responses of Rural and Urban Boys to a 
Personality Questionnaire Item 


LOCATION 

RESPONSE RURAL URBAN TOTAL 
Yes 33 37 70 
T 22 14 36 
No 85 69 154 
Total 140 , E 120 hia 260 


In the application of the method proposed by Leslie, a work table 
consisting of the squared frequencies of the corresponding cells and the 
reciprocals of the row and column totals is constructed. A work table is 
particularly useful with this method of computing chi square. Such a 


TABLE 44. Computational Work Sheet for the Direct Solution of Chi Square 


(1) (2) (3) (4) (5) 


RESPONSE RECIPROCALS SQUARED FREQUENCIES 
OF RURAL URBAN de St I de POVAR 
RAW TOTALS (0.007143) (0.008333) T DÉI KE 


Yes 0.014286 1089 1369 19.186604 0.274100 
Uncertain 0.027778 484 196 5.090480 0.141408 
No 0.006494 7225 4761 91.281588. 0.592874 
Total 1.008382 


x? = 260(1.008382 — 1) = 2.179 


work sheet for the data in Table 43 is shown in Table 44. As seen from 
the formula, the work sheet must provide for the squares of the fre- 
quency for each cell together with the reciprocals for each row and 
column subtotal. The squares of the frequencies are shown in the work 
sheet in Columns (2) and (3) for each of the six cells when classified 


A d 
e ummed for all cases, 2f's equals 2N and the last term will be the sum of the 
wee or theoretical frequencies or N. The formula therefore reduces to 


ll) 


ke, e 


CHI SQUARE 161 


by the three possible item responses. The necessary reciprocals can be 
obtained from tables or by dividing each of the subtotals, both rows and 
columns, into unity. The reciprocals for columns (rural-urban) are shown 
in parentheses under the rural and urban headings and have been obtained 
by dividing unity by the number of cases classified as rural and urban, 
or 140 and 120, respectively. 


The entries needed in column (4), i.e., the 3 E (all are obtained for 


the “yes” response, (0.007143) (1089) + (0.008333) (1369) = 19.186604; for 

the “uncertain” response, (0.007143) (484) + (0.008333) (196) = 5.090480; 

for the “no” response, (0.007143)(7225) + (0.008333)(4761) = 91.281588. 
A AG 

The entries needed in column (5), i.e., 2 E GH are found as ex- 

pected by obtaining the product of the entries in column (1) and (4). When 

the entries in column (5), = E E) ll are summed, substitution may 


be made directly in the chi-square formula 


klot A 
— 260(1.008382 — 1) 


= 2.179 


The solution just shown has been completed using the assumption that 
it is satisfactory to round off the reciprocals to six decimal places. The 
solution also was made, but not here shown, when reciprocals were carried 
to as many decimals as could be obtained on a ten-bank calculator, 
Yielding a chi square of 2.143. The difference between these two chi- 
square values may well be disregarded. With two degrees of freedom 
either chi square is nonsignificant. Evidence is lacking from these data 
to demonstrate that rural and urban boys react differently to this item. 

Caution should be used in assuming from the foregoing example that 
Six-decimal reciprocals will always yield a satisfactory approximation 
of chi square. It is apparent that all rounding errors are greatly magnified 
when they are multiplied by the squares of the cell frequencies. As the 
number of cases in the row and column totals increases, the necessity for 
additional decimal places in the reciprocals increases rapidly. Perhaps 
the safest rule to follow is to use as many digits as can be obtained on 
the available calculator. Such procedure should assure the needed accuracy. 

The Leslie modification for computing chi square can be extended to 
multiple-cell contingency tables of any size. Although this method is 
considerably faster than the conventional method of computation, it has 
the disadvantage that the individual contribution of each cell to the 
chi square is not apparent by inspection. It seems unnecessary to suggest 


162 STATISTICAL METHODS . 


that no one should attempt to solve for chi square by this method unless 
a calculator is available. 

Tables with More Than 30 Degrees of Freedom. Most tables involved 
in actual research will have less than 30 degrees of freedom, the maximum 
for which the chi-square values in the table have been recorded. When- 
ever it is desirable to evaluate the probability of a chi square with more 
than 30 degrees of freedom, a table of the normal curve may be used. 
The normal deviate value may be found by 


- = V2 — V2(df) — 1 


and the probability read from the table of normal curve areas. 

If a chi square of 72.600 is obtained in a 26 x 3 table, what is the 
probability that the frequencies in the table would be so distributed 
by chance even though no relationship actually exists? Substituting in 
the foregoing formula 


z = V (2) (72.6000) — V(2)(50) — 1 = 2.1 


In a table of the normal curve it is shown that an area of 0.4821 is 
found between the mean and a normal deviate of 2.1. 

Thus, 0.0179 of the area lies in the tail of the distribution which is the 
section yielding the probability that the existing relationship could have 
resulted from sampling fluctuation. This portion is equivalent to 1.79 
per cent. Thus, the null hypothesis has been disproven. 


With some situations, chi square will be so small the z will carry a minus 


sign. In this case the area under the normal curve shown in the table is 
added to 50 per cent. For example, a chi square of 42.530 in a 25 X 3 table 


would yield a value of z of —0.523. When substitution is made in the 
formula 


Z = Wäi — V2(df) — 1 = V(2)(42.530) — V(2)(48) — 1 = —0.523 
o 


As seen from a table of the normal curve, an ` value of 0.523 will account 


for approximately 20 per cent of the area from the mean. Since the z carries 


a negative sign, the percentage of cases has risen in the sampling tail of 
the distribution beyond 50 per cent to 70 per cent. Thus, discrepancies 
this great or greater would be expected in 70 per cent of samples drawn 
from a population in which no relationship exists. 

The numbers of boys and girls who were graduated from the eighth 
grade in Iowa schools in one year are shown by counties in Table 45. The 
data shown in this table represent complete enumeration for the state 


CHI SQUARE 


163 


TABLE 45. Sex of Eighth-Grade Graduates of Iowa Schools by County 


COUNTY BOYS GIRLS TOTAL COUNTY BOYS 
Adair 97 111 208 Johnson 206 
Adams 70 80 150 Jones 135 
Allamakee 113 133 246 Keokuk 134 
Appanoose 223 206 429 Kossuth 192 
Audubon 99 90 189 Lee 262 
Benton 204 161 365 Linn 955 
Black Hawk 499 502 1,001 Louisa 117 
Boone 210 218 428 Lucas 122 
Bremer 125 144 269 Lyon 131 
Buchanan 143 165 308 Madison 103 
Buena Vista 170 154 324 Mahaska 209 
Butler 157 157 314 Marion 224 
Calhoun 135 139 274 Marshall 247 
Carroll 96 101 197 Mills 104 
Cass 156 158 314 Mitchell 103 
Cedar 116 139 255 Monona 174 
Cerro Gordo 322 344 666 Monroe 133 
Cherokee 136 137 273 Montgomery 108 
Chickasaw 99 98 197 Muscatine 208 
Clarke 86 83 169 O’Brien 159 
Clay 149 147 296 Osceola 92 
Clayton 196 170 366 Page 157 
Clinton 270 266 536 ` Palo Alto 126 
Crawford 147 158 305 Plymouth 164 
Dallas 206 197 403 Pocahontas 123 
Davis 101 82 183 Polk 1,349 
Decatur 104 119 223 Pottawattamie 528 
Delaware 140 145 285 Poweshiek 139 
Des Moines 287 283 570 Ringgold 72 
Dickinson 91 93 184 Sac 149 
Dubuque 232 230 462 Scott 567 
Emmet 118 123 241 Shelby 110 
Fayette 216 223 439 Sioux 214 
Floyd 133 14 277 Story 264 
Franklin 122 125 247 Tama 159 
Fremont 132 135 267 Taylor 133 

reene 128 126 254 Union 119 
Grundy 113 125 238 Van Buren 82 
Guthrie 147 178 325 Wapello 296 
Hamilton 180 175 355 Warren 158 
Hancock 151 151 302 Washington 128 

ardin 186 198 384 Wayne 130 
Harrison 221 215 436 Webster 302 
Henry 142 120 262 Winnebago 139 
Howard 87 80 167  Winneshiek 140 
Humboldt 125 99 224 Woodbury 700 
Ida 14 121 285 Worth 95 
Towa 128 102 230 Wright 195 
Jackson 160 146 306 
Jasper 281 229 510 Total 19,054 
Jefferson 135 122 257 


GIRLS 


172 
138 
129 
173 
244 
901 
90 
114 
127 
123 
197 
216 
252 
108 
119 
161 
122 
124 
218 
139 
87 
182 
117 
166 
145 
1,399 
530 
130 
85 
171 
577 
127 
231 
256 
176 
119 
138 
100 
324 
134 
134 
132 
275 
106 
125 
759 
82 
191 


19,012 


TOTAL 


378 
273 
263 
365 
506 
1,856 
207 
236 
258 
226 
406 
440 
499 
212 
222 
335 
255 
232 
426 
298 
179 
339 
243 
330 
268 
2,748 
1,058 
269 
157 
320 
1,144 
237 
445 
520 
335 
252 
257 
182 
620 
292 
262 
262 
577 
245 
265 
1,459 
177 
386 


38,066 


164 STATISTICAL METHODS 


for one year and, as such, may be considered a population. In such case * 


no test of significance would seem appropriate. On the other hand, if a 
hypothetical infinite distribution is postulated, the total state census 
may be thought of as a sample. Chi square can then be used to test the 
probability that the sex ratio did not differ for these pupils from county 
to county by more than might be expected if the sex of the graduates 
had been assigned at random. 

When chi square is computed, it is found to be 87.57 with 98 degrees 
of freedom. Substituting in the formula 


= Verë — V2(df) — 1 = V@)87.57) — V@)O8) — 1 = -0.73 


When areas under the normal curve are noted, a normal deviate of 0.73 
accounts for 0.2673 or 26.73 per cent of the area. Since the sign of this 
deviate is negative, the percentage in the sampling tail of the distribution 
is 76.73 per cent. Thus, the inference can be made that if there were no 
sex differences among counties, then the sex distribution shown, or one 
more extreme, would be expected in about three-fourths of the samples, 
if the sexes were allotted at random to the 99 counties in Iowa. 

Limitations of the Application of Chi Square in Testing Hypotheses, The 
limitations of the application of chi square to data with small expected 
frequencies in one or more cells have already been mentioned. In addition, 
other considerations should govern the valid use of this procedure. In all 
applications the relative sizes of the border sums should be examined. 
If the totals of the rows differ greatly from one another, e.g., one might 
be 20, the other being 10,000, the use of chi square would be subject 
to question. The same is true of the column totals. 

Mention has already been made that chi-square values should be com- 
puted only from data classified into mutually exclusive categories, This 
statement indicates that each case in the sample is located in one and only 
one cell. Erroneous application of chi square is sometimes made to prob- 
lems similar to the following. A group of 160 pupils of all class levels of 
a high school were questioned as to which evenings during the weekend 
they study. The responses are listed in Table 46. 


SIR 


TABLE 46. Weekend Study Evenings of High School Students 


of All Class Levels 
CLASS LEVEL FRIDAY BATURDAY SUNDAY TOTAL 
Freshman 10 9 21 40 
Sophomore 16 12 31 59 
Junior 12 10 23 45 
Senior 10 9 20 39 
Total 48 40 95 183 


CHI SQUARE 165 


Obviously some of the students may have studied more than one eve- 
ning and responded in that manner. The null hypothesis that there is no 
relationship between class level and the evening of study could under no 
circumstances be tested by chi square since the categories are not mu- 
tually exclusive. 

A less obvious example of data in which undesirable relationships may 
be present, can be found in a survey of families in a community in regard 
to the sex of the children and whether the children plan to attend college. 
The data might be subdivided as in Table 47. 


TABLE 47. Sex and Decisions of Children 
Concerning College Attendance 


GOING TO COLLEGE 
SEX YES NO TOTAL 


Male 
Female 


Total 


Apparently the null hypothesis that there is no relationship between 
the sex of the school age children and their decision to attend college 
could be tested by computing a chi-square value. However, since the 
sample may include a large proportion of brothers and sisters, such a 
test is not appropriate. Because of the influence of such factors as eco- 
nomic status and parental attitude, brothers’and sisters would tend to 
reach similar decisions. The sample, therefore, has influences which are 
affecting the classification of the sample cases. Interpretation of a chi- 
square value computed from these data is severely limited in that the 
value will be an under-estimation. In the event that it is nonsignificant, 
its interpretation may be erroneous. 

An examination of the computation of chi square reveals that the 
value obtained does not depend upon the order of arrangement of either 
the stub items or the headings in a table. The arrangement may be at 
random. This random arrangement does not always bring out all the 
information which the data supply. For example, in Table 48 information 
is contained concerning participation in extracurricular activities and 
studentship. It should be obvious that studentship is a continuum varying 
from low to high. The utmost in this relationship can be obtained only 
when the “high to low” or “low to high” relationship of the stub items is 
retained. The data in Table 48 yield a chi square which fails to meet the 
5 per cent level. Yet it can be shown that these data will produce sig- 
nificant differences when the assumption that random order arrange- 
ment cannot be improved has been rejected. More satisfactory treatment 


166 STATISTICAL METHODS 
TABLE 48. Participation in Activities and Studentship 


PARTICIPATION IN ACTIVITIES 


STUDENTSHIP YES NO 
Good 12 6 
Average 18 14 
Below Average 10 18 


of data classified on a continuum will be considered later by the more 
appropriate methods of analysis of variance. If chi square has been com- 
puted when one of the classifications is a variable characteristic and 
found to be significant, the more appropriate methods will yield even 
higher levels of significance. The finding of a nonsignificant chi-square 
value, however, will not guarantee that the same interpretation will follow 
from treatment by other types of analysis. The principle can be arrived 
at logically or by repeated trial, that chi square, except under the most 
unusual circumstances, is an underestimate of significance whenever a 
multiple-cell table consists of either stub items or headings which can be 
thought of as being on a continuum. 


GOODNESS OF FIT 


Chi square can be used to determine how well an actual frequency 
distribution fits some theoretical distribution, such as the normal distribu- 
tion. The application of the chi-square technique, called “goodness of fit,” 
in such instances offers only an approximate test, but one which is sat- 
isfactory for most purposes. To illustrate the goodness-of-fit test, the 
363 intelligence quotients shown in Table 49 will be used. The mean of 


TABLE 49. Distribution of 868 Intelligence Quotients 


1.Q. INTERVAL FREQUENCY 
140-144 4 
135-139 10 
130-134 15 
125-129 16 
120-124 31 
115-119 34 
110-114 37 
105-109 38 
100-104 45 

95-99 36 
90-94 38 
85-89 23 
80-84 17 
75-79 14 
70-74 5 


Total 363 


CHI SQUARE 167 


these data is 105.6 and the standard deviation is 15.9. In general, the 
expected frequencies for a “best-fitting” normal distribution for these 
data can be had by redistributing the 363 cases according to the area 
under the normal curve represented by each interval. The best-fitting 
normal distribution is also forced to agree with the actual distribution 
with respect to mean and standard deviation. 

To determine the frequencies for each interval of the postulated normal 
distribution, i.e., the expected frequencies, the x/o-distance of the lower 
theoretical limit of each interval is first computed. For example, the 
xJo-distance of the lower theoretical limit of the highest interval is 
139.5 — 105.6 

15.9 
the lowest interval since the normal curve area to the left of this x/o-dis- 
tance is usually so small that it can be combined with that in the lowest 


= 2.13. It is unnecessary to compute the xVo-distance of 


'TABLE 50. Computation of Expected Frequencies for Best-Fitting 
Normal Distribution 


EXPECTED 

LOWER LIMIT z 2/0-DISTANCE FREQUENCY 

OF INTERVAL (X = 105.6) (e = 15.9) AREA (arpa X N) 
139.5 33.9 2.13 0.0166 6.0 
134.5 28.9 1.82 0.0178 6.5 
129.5 23.9 1.50 0.0324 11.8 
124.5 18.9 1.19 0.0502 18.2 
119.5 13.9 0.87 0.0752 27.3 
114.5 8.9 0.56 0.0955 34.7 
109.5 3.9 0.25 * 0.1136 41.2 
104.5 —1.1 —0.07 0.1266 46.0 
99.5 —6.1 —0.38 0.1201 43.6 
94.5 —11.1 —0.70 0.1100 39.9 
89.5 —16.1 —1.01 0.0858 31.1 
84.5 —21.1 —1.33 0.0644 23.4 
79.5 —26.1 —1.64 0.0413 15.0 
74.5 —31.1 —1.96 0.0255 9.2 
69.5 0.0250 9.1 


interval. Obviously any diserepancy arising from this procedure may be 
offset by the comparable situation existing in the highest interval. When 
the xJo-distances have been determined, the area under the normal 
curve between ordinates at each 2/o-distance in the distribution is 
found from a table of ordinates and areas of the normal curve. When the 
number of cases in the sample is multiplied by the proportionate area of 
the curve represented by each interval, the expected frequencies shown 
in Table 50 result. Chi square can then be computed in the manner already 
described. 


“= Expected 


2| Expected)" = 12.580 


168 STATISTICAL METHODS 


The number of degrees of freedom associated with this chi-square value 
is 12, three less than the number of intervals in the distribution. When 
a table of chi square is consulted, the value of 12.580 with 12 degrees of 
freedom is found well within the 5 per cent limits. Similar to the inter- 
pretation of problems of estimation, the evidence is insufficient to indicate 
that the sample could not have resulted from a normally distributed 
population. 

The 1.Q.'s in the foregoing example were compared with an expected 
distribution having a mean, standard deviation, and number of cases 
equal to those values found in the actual distribution. It would have been 
possible to compare the actual distribution with a theoretical distribution 
having a mean and standard deviation obtained from some source other 


Tase 51. Computation of Expected Frequencies for Distribution 
with Specified Characteristics 


EXPECTED 
LOWER LIMIT z x/o-DISTANCE FREQUENCY 
OF INTERVAL (X = 100) (o = 15) AREA (area X N) 
139.5 39.5 2.63 0.0043 1.6 
134.5 34.5 2.30 0.0064 2.3 
129.5 29.5 1.97 0.0137 5.0 
124.5 24.5 1.63 0.0272 9.9 
119.5 19.5 1.30 0.0452 16.4 
114.5 14.5 0.97 0.0692 25.1 
109.5 9.5 0.63 0.0983 35.7 
104.5 4.5 0.30 0.1178 42.8 
99.5 —0.5 —0.03 0.1299 47.1 
94.5 —5.5 —0.37 0.1323 48.0 
89.5 —10.5 —0.70 0.1137 41.3 
84.5 —15.5 —1.03 0.0905 32.8 
79.5 — 20.5 —1.37 0.0662 24.0 
74.5 —25.5 —1.70 0.0407 14.8 
69.5 0.0446 16.2 


than the actual distribution. In this case the expected distribution is 
forced to agree with that obtained in only one respect, number of cases, 
and only one degree of freedom is lost. Otherwise, the procedure is 
identical with that shown in Table 50. 

For purposes of illustrating the application of the goodness-of-fit test 
for comparing an actual distribution with one having a postulated mean 
and standard deviation which differ from those of the actual distribution, 
Table 51 has been prepared. The actual distribution is the same as that 
shown in Table 49. In this case, however, it is assumed that there is 
reason for postulating a mean of 100 and a standard deviation of 15 
as the population characteristics. Obviously, any values which can be 
logically defended might have been postulated. pë 
mputed from the data in Table 51 is found to be 85.390. 


Chi square Co: i 
This value exceeds that found at the 5 per cent level of confidence with 


CHI SQUARE 169 


14 degrees of freedom. Available evidence indicates that it is highly 
improbable that the actual distribution is a random sample from a 
normally distributed population having a mean of 100 and a standard 
deviation of 15. As can be seen from Table 51, the expected frequencies 
in the higher intervals are smaller than those in Table 50, whereas the 
expected frequencies in the lower intervals are larger. This circumstance 
results from shifting the theoretical distribution lower on the scale and 
from decreasing the size of its standard deviation. In actual practice, 
it would have been desirable to combine the upper three intervals in 
Table 51 so as to have had all expected frequencies larger than 5. This 
procedure was not followed here because a direct comparison of Tables 
50 and 51 in terms of degrees of freedom was desired. 

The illustration of the goodness-of-fit technique in this section has 
involved obtaining the expected frequencies for each interval from the 
normal distribution. It should be obvious that this procedure can be used 
for fitting other types of theoretical distributions to obtained data. The 
computational procedures are the same. Tt should be noted, however, that 
similar to other problems of estimation, the goodness-of-fit analysis 
will be no more defensible than the reasoning upon which the selection 
of the particular theoretical distribution is based. 


Exercises 


1. When the first 33 presidents of the United States are considered, their 
children have totaled 109; 71 have been boys and 38 have been girls. Assuming 
that the birth rate of boys to girls is 51 to 49, determine whether the numbers 
of presidents’ children of each sex differ from this ratio. Interpret your answer. 

2. Before World War II, the ratio of freshmen to sophomores to juniors to 
seniors at College X was 3 to 2.1 to 1.5 to 1. During a selected postwar year 
there were 1366 freshmen, 813 sophomores, 583 juniors, and 417 seniors. Does 
the postwar class level distribution of the student body differ from the prewar 
distribution? Interpret your answer. 

3. In a certain city, there are three large daily newspapers. A sample of 
readers who regularly read the editorial pages of all three was asked to indicate 
which editorial page they considered to be the best. Of the 1127 readers in the 
sample, 313 preferred Newspaper A, 497 preferred Newspaper B, whereas 317 
preferred Newspaper C. Is there sufficient evidence to indicate a difference 
among the newspapers with respect to the quality of their editorial page as 
evaluated by readers? Interpret your answer. 

4. Data from a follow-up study of a sample of former college students who 
did and did not work during the academic year so as to earn at least 50 per cent 
of their college expenses, are shown jn the following table. 

jn A O 


M COLLEGE 
WORKED GRADUATED FRO! 


FOR EXPENSES YES NO TOTAL 
A se E 
Yes 38 47 85 
No 78 208 286 


Total 116 255 371 


170 STATISTICAL METHODS 


Test the hypothesis that tendency to graduate from college is not related to 
whether a student works during the academic year to defray expenses. Interpret 


your answer. 
5. After the installation of a new stop sign on a corner in a residential area, 


tabulations were made for four hours of the men and women drivers who 
observed and ignored the sign. 


OBSERVED THE SIGN 


SEX OF DRIVER YES NO TOTAL 
Male 45 8 53 
Female 17 4 21 


Test the hypothesis that there is no difference between male and female 
drivers in their tendency to observe the newly installed stop sign. Interpret 


your answer. 
6. Ninth-grade boys classified on the basis of residence were asked whether 


they preferred to study printing or woodworking. The responses are listed below. 


PREFERRED PREFERRED 


RESIDENCE PRINTING WOODWORKING TOTAL 
Rural farm 13 25 38 
Rural nonfarm 15 31 46 
Urban 12 9 21 
Total 40 65 105 


Test the hypothesis that the preference between printing and woodworking, 
as expressed by ninth-grade boys, was not influenced by residence. Interpret your 
answer. 

7. A sample of students from various colleges of an eastern university was 
questioned as to whether they had, during the preceding academic year, con- 
tributed a pint of blood to a blood bank. The responses are tabulated below. 


CONTRIBUTED BLOOD 


COLLEGE YES NO TOTAL 
Agriculture 7 62 69 
Arts and Sciences 11 92 103 
Engineering 19 69 88 
Home Economics 18 35 53 
Education 3 24 27 
Total 58 282 340 


Test the hypothesis that students of the various colleges did not differ in their 


tributions. Interpret your answer. vind 
Ko distribution of 1.Q.’s resulted from administering a test of 


8. The following 


CHI SQUARE 171 


mental ability to 166 eighth-grade pupils in a technical school. Determine how 
well these scores fit a normal distribution. 


1.Q. INTERVAL FREQUENCY 
125-129 1 
120-124 2 4 
115-119 5 
110-114 11 
105-109 11 
100-104 13 

95-99 28 
90-94 21 
85-89 20 
80-84 28 
75-79 10 
70-74 8 
65-69 3 
60-64 3 


Total 166 


10 


Analysis of Variance—Single 


Classification 


Investigations are frequently undertaken in which a comparison of 
two or more groups of individuals is desired on the basis of a variable 
or continuum characteristic. The groups may represent samples of indi- 
viduals who have been exposed to different conditions or they may be the 
various categories in some classification. Whenever only two groups are 
being compared, t is appropriate for the test of significance of the dif- 
ference between the groups. When, however, a comparison among more 
than two groups is desired, ¢ is no longer suitable. It can be seen that 
it would be possible to compare each group with every other group and 
then test the significance of each of these differences with the use of t. 
The disadvantages of this procedure become obvious, however, when the 
number of possible combinations of two groups is considered.* For ex- 
ample, when there are three groups to be compared, there are only three 
combinations of two groups, but when there are ten groups, the number 
of possible combinations of two is 45. A further disadvantage of this 
procedure is that the investigator has no assurance that any of the 
differences between the groups involved are significant. The analysis of 
variance has been designed to provide an efficient test of the significance 
of the differences between two or more groups simultaneously. In the most 
easily understood case the method consists of contrasting the variance 
of individual values around the group means within equal-sized groups 
with the variance of the group means around the general or grand mean 
of the ungrouped data. From this relatively simple design, i.e., single 
classification, the method has been expanded so as to be applicable to 
the analysis of several complex experimental designs. The more complex 
analyses, however, will be discussed in later chapters. 


1 For determining the number of possible combinations of groups taken two at a 


time, the expression za can be used, where m is the number of separate 
d 


roups. 
E 172 


ANALYSIS OF VARIANCE—SINGLE CLASSIFICATION 173 


EQUAL FREQUENCY IN GROUPS BEING COMPARED 


As an example of the application of analysis of variance to a problem 
involving the comparison of more than two groups of equal size, the hypo- 
thetical data in Table 52 represent the scores obtained on a test of 
cultural knowledge by ten men enrolled in each of five colleges of a 
university. ` 

Inspection of the table reveals differences among the colleges. To test 
the significance of the differences among the colleges, the null hypothesis 
is assumed. In other words, a homogeneous population is postulated, from 
which variation among samples as great as that shown could reasonably 


TABLE 52. Test Scores of Men Enrolled in Five Colleges 


COLLEGE 
ARTS AND BUSINESS 
SCIENCES LAW ENGINEERING ADMINISTRATION EDUCATION 
10 18 10 17 11 
13 11 12 10 10 
17 13 19 14 17 
15 18 16 11 16 
16 15 13 14 16 
15 15 14 14 14 
9 16 14 15 15 
12 16 14 17 13 
14 14 12 16 15 
14 17 13 12 15 
135 153 137 140 142 
Mean 13.5 15.3 13.7 14.0 14.2 
Grand Total = 707 Grand Mean = 14,14 


wae 


be attributed to sampling fluctuation. A test for determining whether. 
the null hypothesis is tenable is then made. 

To accomplish the testing of the hypothesis the entries in Table 53 
are completed. The total variation in scores is divided into that variation 
of the college means around the grand mean and into that individual 


Variation within colleges around each college mean. In analysis of variance 


tables such as Table 53, the word groups is sometimes used to designate 
that source of variation associated with the variation of the group means 
around the grand mean. The number of degrees of freedom for total is 49, 
one less than the number of individuals in the investigation; the number 
of degrees of freedom for colleges is 4, one less than the number of 
colleges; the number of degrees of freedom for within variation is 45, the 
difference between the two foregoing degrees of freedom, or the sum of the 


174 STATISTICAL METHODS 


number of individuals in each college minus one, i.e, 9 + 9 + 9 + 
9 + 9= 45. 

The sum of squares for total may be found in two different ways 
although the methods are in reality identical. The use of the first method 
provides a meaningful explanation of the procedure. The second method 


TABLE 53. Analysis of Variance of Test Scores Among 
Colleges—Equal Size Samples 


SOURCE OF DEGREES OF SUM OF MEAN 

VARIATION FREEDOM SQUARES SQUARE 
Colleges 4 19.72 4.93 
Within 45 254.30 5.65 
Total 49 274.02 (5.59) 

4.93 
- == =0. 
Fiss 5.65 0.873 


requires much less time for solution except in situations where the mean 
for the entire group, as well as the means for the columns, are whole 
numbers. With actual data such a situation, for all practical purposes, is 
nonexistent. 


By the first method, the sum of squares for total is found by summing 
the square of the difference between each score and the grand mean. 
Sum of squares for total = (10 — 14.1)? + (13 — 14.1)? + (17 — 14.1)? 
+ (15 — 14.1)? + (16 — 14,1) + --- 
+ (15 — 14.1)? + (15 — 14.1)? 
= 274.02 


By the second method, this same sum of squares for total is obtained 
by subtracting a correction from the sum of the squares of each of the 50 
scores. The correction is obtained by dividing the square of the sum of the 
scores by the number of students. The general formula, then, for total 


sum of squares is 


2 
Sum of squares for total = 2X? — ox 


where 
X = score 


N = number of cases 
Therefore, the sum of squares for total 
= (10? + 132 + 17+ +--+ 132 + 152 + 15?) — ET 
= 10,271 — 9,996.98 = 274.02 


ANALYSIS OF VARIANCE—SINGLE CLASSIFICATION 175 


Thus, it is seen that the two methods will yield identical values for 
the sum of squares. 

The sum of squares for colleges may also be found by two methods. 
The first is relatively easy to understand but usually time-consuming to 
compute; the second is mathematically identical and is more generally 
used. 

By the first method the sum of squares is obtained by summing the 
squared differences between the mean for each college and the grand 
mean. The sum obtained is multiplied by the number of students in each 
college since the mean values only utilize one-tenth as many entries as 
students. Thus, in the example, the sum of squares for colleges 
= 10[(13.5 — 14.1)? + (15.3 — 14.1)? + (18.7 — 14.1)? + (14.0 — 14.1) 

+ (14.2 — 14.) 


= 19.72 

By the second method the sum of the scores for each college is squared 
and then all these squares for each college summed. This sum is then 
divided by the number of students in each college and from this quotient 
the correction term (previously reported) is subtracted. 
s EXP + (EX) + +++ + (Xn)? (XY 

um of squares for groups = t N 
where DX), EX m, etc. = total score for any group 

k = number of individuals per group 


(135)? + (153)? + (137)? + (140)? + (142)? 


Sum of squares for colleges = 10 


_ (707)? 
50 


= 19.72 


The value is then inserted in the proper place in Table 53. 

The sum of squares within each category of the classification may be 
found in two ways. Here again, the first is self-explanatory and the 
second saves time. The within sum of squares is obtained by summing 
the square of the difference between each student’s score and the mean 
score for his college. 

The within sum of squares is for each college: 


Arts and Sciences 

S.S. = (10 — 13.5)” + (13 — 13.5)? + +- + (14 — 13.5)? = 58.50 
Law 

S.S. = (18 — 15.3)? + (11 — 15.3)? + +++ + (17 — 15.3)? = 44.10 


Engineering 


Ma 


176 STATISTICAL METHODS 

8.8. = (10 — 13.7)? + (12 — 18.7)? + --- + (13 — 13.7)? = 54.10 
Business Administration 

SS. = (17 — 14)? + (10 — 14)? + --- + (12 — 14)? = 52.00 
Education 

S.S. = (11 — 14.2)? + (10 — 14.2)? + -++ + (15 — 14.2)? = 45.60 
The within sum of squares, then, is 

58.50 + 44.10 + 54.10 + 52.00 + 45.60 = 254.30 

This value is inserted in Table 53. It can be seen in this table that the 


sum of squares among colleges plus that within colleges equals the total 
sum of squares. 

By the second method, the within sum of squares is not directly 
computed. It is found by subtracting the sum of squares for colleges from 
the total sum of squares. 

The convenient method, then, for obtaining the sum of squares may 


be summarized: 


2 

For total SS = 2X? — oxy 
For groups S.S. = (2X1)? + EX) + +++ + Xn) Ge 
Sh S 5) 


For within S Saw = [S.S — S.S.o) 

The mean square values needed in Table 53 are obtained by dividing 
the sum of squares by the corresponding degrees of freedom. The mean 
square for total is an estimate. of the population variance and, if an 
estimate of the standard deviation of the population from which the 
sample was drawn should be wanted, it may be found by extracting the 
square root. If the standard deviation in the entire sample is wanted, 
the sum of squares for total is divided by N rather than by N — 1 and 
the square root of this quotient is then extracted. The mean square for 
colleges and mean square for within are two independent estimates of the 
population variance. To test the significance of the differences among the 
five colleges these estimates of variance are compared by 


Ta Group mean square 
Within mean square 
Tt is well to note that the mean square of the issue to be tested is placed 
in the numerator and the within mean square in the denominator. 
In Table 53 then, 
4.93 _ 
Fu Egg > 007 
Consulting a table of F it is seen that this value is not significant with 
A and 45 degrees of freedom. Thus evidence is not available for rejecting 


ANALYSIS OF VARIANCE—SINGLE CLASSIFICATION 177 


the idea that students in the various colleges do not differ in cultural 
knowledge as represented by the test. In using the table of F, the number 
of degrees of freedom for the groups being compared or main effects is 
shown across the top of the table and for within down the left-hand 
side. It might be well to point out here that irrespective of the number 
of degrees of freedom, when F is less than unity it is unnecessary to con- 
sult a table, since it is always nonsignificant. 


UNEQUAL FREQUENCY IN GROUPS 
BEING COMPARED 

The use of equal numbers of cases in the groups being compared is to 
be recommended, in which case the computation described in the fore- 
going section will be followed. In many situations, particularly in the 
social sciences, the testing of significance involves unequal numbers of 
cases in the groups being compared. When this situation prevails, the 
procedure for computation involves the modifications indicated in the 
following section. 

Whenever unequal numbers of cases appear in the columns, the same 
techniques are used as those used for equal frequencies with one excep- 
tion. It will be recalled that the sum of squares among groups with 
equal frequencies is 


_ EX) + CXV + + Xa) XY 
J k N 


For unequal frequencies, each squared total is divided by the number of 
cases involved in obtaining that total, i.e., 
SS JE, 4 Ca Y 
1 ki Vo 

The cultural knowledge test scores for 50 men enrolled in five colleges 
of a university were shown in Table 52. As an example of the application 
of the analysis of variance to a single classification in which the fre- 
quencies in the groups being compared are unequal, let it be assumed 
that Arts and Sciences had only the first six students listed; Law the first 
seven; Engineering the first eight; Business Administration the first nine; 
and Education the ten listed. 

Then the sum of squares for total is 


SS. = 10? + 132 + 172 + +++ + 13? +15? + 15? — 


= 8456.00 — 8236.90 = 219.10 
The sum of squares for colleges is 


_ [ (86)? 106)? 112)? 128)? 142 ke (574)? 2 
Ss pe + Crs ae + + Sr a 


574)? 
40 


= 8242.65 — 8236.90 
= 5.75 


178 STATISTICAL METHODS 


TABLE 54. Analysis of Variance of Test Scores Among 
Colleges—Unequal Size Samples 


eee ee 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
EE 
Colleges 4 5.75 1.44 
Within 35 213.35 6.10 
Total 39 219.10 (5.62) 


The sum of squares for total and for colleges are then entered in Table 
54, The within sum of squares can be obtained by subtraction. 

Since the ratio of the mean square among colleges to that within col- 
leges is less than unity, significant differences among colleges do not exist. 


RELATION OF t TO F FOR COMPARISON 
OF TWO GROUPS 


Whenever only two groups are being compared, whether they contain 
either equal or unequal frequencies, the analysis of variance can be ap- 
plied in the manner previously described. If desired, the value of ¢ can 
be obtained from this analysis by extracting the square root of the result- 
ing F-value. In fact, in most situations where £ is the statistical measure 
wanted, particularly when the groups being compared are of unequal 
size, it is more convenient to compute F by the analysis of variance and 
obtain the £-value than to compute the t, or the customary “critical ratio” 
of classical sampling theory. 


TABLE 55. Achievement Data from Notebook Investigation of 46 Students 


GROUP k EX =x? ze MEAN 
nn. ——= 
Notebook 23 1274 81,118 10,549.48 55.39 
No Notebook 23 1429 98,331 9,546.61 62.13 
av wae > 
Total 46 2703 179,449 20,618.37 58.76 


AAA 


To demonstrate the relationship between t and F when the number of 
degrees of freedom for the F-test numerator is one, the data in Table 55 
will be used. These data have been selected from an investigation of the 
effectiveness of preparing a notebook in an eighth-grade science course. 
Forty-six students were divided into two groups of 23 each. Each group 
had identical teaching with the exception of the notebook requirement 
for one group. The criterion in this aspect of the investigation was the 
obtained on a final examination over the entire course. The null 


score E 
e., there is no greater difference between these 


hypothesis was assumed, i. 


ANALYSIS OF VARIANCE—SINGLE CLASSIFICATION 179 


groups when they are compared on the basis of final examination scores 
than would be expected in the sampling fluctuation from a population in 
which there is no true difference between groups. 

The analysis of variance for these groups is shown in Table 56. The 


TABLE 56. Analysis of Variance of Achievement Scores 
of Notebook Investigation 


SOURCE OF DEGREES OF SUM OF MEAN 

VARIATION FREEDOM SQUARES SQUARE 
Groups 1 522.28 522.28 
Within 44 20,096.09 456.73 
Total 45 20,618.37 


522.28 me Dese 
Pia = 673 7 bit VF =t = V1.144 = 1.069 | 
F-value of 1.144 with one and forty-four degrees of freedom is nonsig- 
nificant. Extracting the square root of this value yields a £ of 1.069, also 
nonsignificant with (kı — 1) + (ke — 1) or 44 degrees of freedom. 

Substituting the data in Table 55 into the formula for obtaining ¢ when 
two independent groups of equal size are being compared 


Xi = Xo 
is 
k(k — 1) 
62.13 — 55.39 6.74 _ 1 O69 


t= 9546.61 + 10,549.48 6.302 
23(23 — 1) 


It should be noted that k is the number of cases in either group. 

Thus, with either method the same results are achieved and the same 
conclusion reached that available evidence is insufficient to reject the 
null hypothesis of no difference between the two groups on the basis of 
final examination score. 


the formula becomes 


ANALYSIS OF VARIANCE APPLIED TO 
FREQUENCY DISTRIBUTIONS 


The analysis of variance may be applied to three types of frequency 
distributions. These distributions include single-unit classification, mul- 
tiple-unit classifications, and qualitative ratings on a continuum which 
may be assigned numerical values. 

An example of a frequency distribution tabulated according to a 
single-unit classification is shown in Table 57. These data have been tabu- 


180 STATISTICAL METHODS 


TABLE 57. Test Items Correct for Students in Three 
Sections of Beginning Psychology 


NUMBER OF SECTION 
ITEMS CORRECT B 


> 
a 


TOTAL 


+ 
A A A A O 


mi 9 NO N DH An 
NR CNT 000 OI OI CO CI HO 
Y 
© 


gles 
F| ROVER AO 
> 
E 


ES 
È 
S 


lated according to the number of students in three sections of a course in 
beginning psychology who answered a given number of items correctly 
in a short examination. The analysis of variance can be followed through 
in the usual manner. The sum of squares for total is 


ECH = 2X? — oxy 


= 3(0)? + 5(1)? + 6(2)? + 10(3) + +++ + 5(10)? + 4(11)2 

_ (8) + 5(1) + 6(2) + 10(3) + - ++ + 5(10) + 4(11)7* 
133 

a OE 

= 5625 e = 848.99 


The sum of squares for sections is 


QX), (2X)? , AX)? (EX) 
SS. = a re A A Om, 


where 
ZX, = 1(0) + 3(1) + 1(2) + --- + 1(11) = 258 
DX. = 0(0) + 1(1) +2(2) + ++ +1(11) = 280 
EX: = 2(0) + 1(1) + 3(2) + --- + 2(11) = 259 
ki = 44 ka = 44 ka = 45 
Therefore, the sum of squares for section is 


= (258)? , (280)? , (259)? _ (797)? _ 
S.S. = 44 at 44 T 45 133 9.32 


The sum of squares within sections is 


ANALYSIS OF VARIANCE—SINGLE CLASSIFICATION 181 


S.S. = 848.99 — 9.32 = 839.67 


These sums of squares are then entered in Table 58 and the mean 
squares are obtained by dividing by the number of degrees of freedom. 
An F-value of 0.72 under no conditions would be significant, hence there 
is insufficient evidence to reject the null hypothesis. On the basis of 


TABLE 58. Analysis of Variance of Test Scores of Three 
Sections of Beginning Psychology 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Sections 2 9.31 4.66 
Within 130 839.68 6.46 
Total 132 848.99 

4.66 
= — = 0.72 
are 


available evidence it is reasonable to assume that no difference in number 
of items correct can be demonstrated to exist among these three sections. 

In many distributions, the values have been grouped into intervals 
which are not single units. An example of a multiple-unit distribution is 
shown in Table 59. 


TABLE 59. Sex and Salary of Teachers 
E e EE ee EE, 


BALARY 


INTERVALS CODE MALE FEMALE BOTH 
4400-4599 6 4 1 5 
4200-4399 5 8 5 13 
4000-4199 4 12 10 22 
3800-3999 3 15 12 27 
3600-3799 2 8 18 26 
3400-3599 A 3 TË 10 
3200-3399 0 1 3 4 
Total 51 56 107 


Although the analysis may be carried out by the method just previously 
described, the use of the mid-point of the intervals, viz., 3300, 3500, and 
So on, involves squaring large numbers. The work can be done in much 
less time if code numbers, beginning with zero, are assigned to the inter- 
Vals as has been done in Table 59. The analysis of variance is then 


S.S. for total = 4(0)? + 10(1) + 26(2)? + --- + 5(6)2 
— 4@ + 10(1) + 26(2) + +++ + 5(6)7* _ 200,77 
107 f 


182 STATISTICAL METHODS 


— (176)? , (150)? _ (826)? ` 
S.S. for sex = at + 56 107 7 15.93 


S.S. for within = 220.77 — 15.93 = 204.84 


The analysis of variance is then shown in Table 60. An F-value of 8.17 
with 1 and 105 degrees of freedom is significant beyond the 1 per cent 
level and the null hypothesis is rejected. Clearly, there is a sex difference 
in salary in the populations from which the samples have been drawn. 
Many distributions are available where classifications have been made 
in terms of qualitative descriptions of a continuous variable. These 


TABLE 60. Analysis of Variance of Teachers’ Salaries 


SOURCE OF DEGREES OF SUM OF MEAN 

VARIATION FREEDOM SQUARES SQUARE 
Sex 1 15.93 15.93 
Within 105 204.84 1.95 
Total 106 220.77 (2.08) 

15.93 

Pa = 22% = 8.17 
EE 


qualitative descriptions may be assigned numerical values, The usual 
assumption is made of equal distances between consecutive descriptions 
and code numbers assigned, beginning with zero. In certain cases, for 
logical reasons, other more appropriate numbers than consecutive num- 
bers may be assigned. 

An example of a qualitative description distribution is shown in Table 
61. Here the diets are assigned code numbers from zero to three. The 
analysis of variance is then made in the usual manner. 


TABLE 61. Diets of High School Pupils and Occupations of Fathers 


PROFES- 

DIET CODE FARMERS SIONAL BUSINESS LABOR OTHERS ‘TOTAL 
Excellent 3 10 5 12 10 8 45 
Good 2 e 20 10 13 10 8 61 
Fair 1 20 5 10 12 10 57 
Poor 0 5 2 4 6 7 24 
Total 55 22 39 38 33 187 


S.S. for total = 45(3)? + 61(2)? + 57(1)2 + 24(0)2 
— 1458) + 61(2) +57) 420% _ 706 — 527.25 = 178-75 
187 : 


(90)? q OJ (72)? , (62)? 


i (50)? (314)? ` 
S.S. for occupation = pë a + 30 $ A $ = te 


= 2.59 


ANALYSIS OF VARIANCE—SINGLE CLASSIFICATION 183 


TABLE 62. Analysis of Variance of Diets 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Occupation 2.59 0.65 


4 
Within 182 176.16 0.97 


Total 186 178.75 (0.96) 


The completed analysis is shown in Table 62. Since the mean square 
for occupations is smaller than for within, the differences are not sig- 
nificant. There is insufficient evidence to indicate that the diets of high 
school pupils are a function of fathers’ occupations. 


TESTING SIGNIFICANCE BETWEEN TWO MEANS 
FOLLOWING THE ANALYSIS OF VARIANCE 


Whenever a significant F-value has been found from the application 
of the analysis of variance to a single classification containing more 
than two categories, tests of the significance of the differences between 
specific pairs of sub-groups may be desired. These tests can be accom- 
plished by computing t-values between any two means. It should be noted, 
however, that the individual mean differences can no longer be regarded 
as random observations from a normally distributed population since the 
F-value found was significant. Further, some of the evidence collected in 
the initial design of the investigation is being ignored. Because of these 
considerations, it is probably appropriate to consider significant ¢-values 
between the specific group means only as indications of areas where 
additional research may be desirable. 


ASSUMPTIONS UNDERLYING THE 
ANALYSIS OF VARIANCE 


Underlying the application of the analysis of variance are several as- 
sumptions upon which the development of this method has been based. 
The more the data in an investigation depart from the strict fulfillment 
of the assumptions the more likely is the investigator to reach erroneous 
Conclusions. In the actual research situation, particularly in the social 
sciences, it may be difficult to satisfy all assumptions. Further, it is 
doubtful whether this failure is sufficiently great in most situations to 
validate the application of the technique. Recent evidence suggests 
that the limits of tolerance within which the assumptions must be ap- 
proximated are wider than it was originally thought. 

One of the major assumptions in the analysis of variance is that the 
observations within each category must be random samples. If this con- 
dition is not approximated, the effectiveness of the classification cannot 
be tested accurately. 


184 STATISTICAL METHODS 


Another major assumption is that the variances within the subgroups 
are homogeneous, i.e., they are data from a single normally distributed 


population. It is assumed that the best estimate of the population vari- ' 


ance can be obtained from the pooled variances among the subgroups. 
If this condition is true, each individual subgroup variance should yield 
the same evidence about the population variance. If this latter assump- 


tion is not fulfilled, the level of significance of the differences cannot be ` 


considered exact. There is increasing evidence, however, that the necessity 
for the homogeneity of variance is not as serious a consideration as it 
was formerly thought to be.* 

From theoretical considerations the foregoing assumptions must be 
satisfied before the application of the analysis of variance is appropriate. 
However, it is becoming more apparent that the analysis of variance 
technique is sufficiently satisfactory even when there is considerable 
departure from the strict fulfillment of the assumptions. 


Exercises 


1. A sample of single, 30-year-old, college-trained men was administered an 
attitude-toward-war test. Of the 98 men included in the sample, 32 were combat 
veterans of World War II, 30 were noncombat veterans, and 36 were non- 
veterans. 


COMBAT VETERANS NONCOMBAT VETERANS NONVETERANS 
NUMBER SCORE NUMBER SCORE NUMBER SCORE 

d 95 1 82 1 106 

2 94 2 81 2 98 

3 81 3 105 3 107 

4 86 4 95 4 89 

5 103 5 7 5 72 

6 95 6 105 6 94 

7 70 7 79 7 72 

8 100 8 90 8 86 

9 71 9 101 9 95 
10 85 10 105 10 102 
11 84 11 84 1 107 
12 78 12 97 12 79 
13 83 13 109 13 103 
14 90 14 89 14 110 
15 104 15 70 15 76 


1 A more detailed discussion of the extent to which the assumption of homogeneity 
of variance is demanded for the application of the analysis of variance is included 
in W. G. Cochran and G. M. Cox, Experimental Designs (New York, John Wiley 
& Sons, Inc., 1950). 

Tests for homogeneity of variance when more than two groups are being com- 
pared can be found in P. O. Johnson, Statistical Methods in Research (New York, 
Prentice-Hall, Inc., 1949). The original manuscripts describing these tests which are 
discussed by Johnson may not be readily available. 


| 
| 


ANALYSIS OF VARIANCE—SINGLE CLASSIFICATION 185 


COMBAT VETERANS NONCOMBAT VETERANS NONVETERANS 
NUMBER SCORE NUMBER SCORE NUMBER SCORE 

16 115 16 76 16 83 
17 94 17 76 17 71 
18 100 18 91 18 111 
19 100 19 110 19 84 
20 80 20 79 20 78 
21 78 21 105 21 90 
22 89 22 95 22 98 
23 85 23 113 23 96 
24 94 24 115 24 108 
25 95 25 85 25 96 
26 116 26 74 26 114 
27 89 27 113 27 75 
28 102 28 92 28 95 
29 101 29 88 29 70. 
30 81 30 80 30 89 
31 87 

33 79 

34 100 

35 80 

36 82 


If a high score is indicative of an unfavorable attitude toward war, test the 
null hypothesis that there is no difference among single, 30-year-old, college- 
trained combat veterans, noncombat veterans, and nonveterans in their attitude 
toward war as measured by this test. Interpret your answer. | nae 

2. Two sophomore classes in general psychology were taught in a similar man- 
ner for all practical purposes. However, Class A met at 9:00 a.m., whereas Class 

met at 1:00 pm. Pretest and final test scores of both classes are shown in 
the following table. Using the difference between the final test and pretest as 
the criterion, test the hypothesis that the two classes did not differ in the gain 


Manifested during the semester. Interpret your answer. 


ee 


CLASS A E 
STUDENT PRETEST FINAL TEST STUDENT PRETEST FINAL TEST 

1 89 120 1 53 68 
2 35 15 2 74 101 
3 75 110 3 82 111 
4 76 114 4 æ a 
5 74 115 5 

6 62 102 6 45 63 
7 Ze 92 7 80 97 
8 Fe 70 8 67 93 
9 Z 60 9 73 88 
10 PË 69 10 48 90 


186 STATISTICAL METHODS 


—_—_—— 


CLASS A CLASS B 
STUDENT PRETEST FINAL TEST STUDENT PRETEST FINAL TEST 
11 73 80 11 60 62 
12 69 64 12 52 100 
13 60 50 13 57 77 
14 70 102 14 51 81 
15 50 « 101 15 63 73 
16 59 86 16 79 75 
17 51 95 17 55 79 
18 54 96 18 56 71 
19 53 105 19 43 73 
20 52 102 20 46 68 


3. The classroom behavior ratings of 84 first-grade pupils having parents in 
professional and nonprofessional fields were reported as follows: 


OCCUPATION OF PARENTS 
RATINGS PROFESSIONAL NONPROFESSIONAL TOTAL 


ke KEE) 


RS DO 
(0 =I WN 00H cO Or a 
Di 
KI 


Total 36 48 84 


The higher the rating, the more satisfactory the classroom behavior. 

Test the hypothesis that children having parents in professional fields and 
children of parents in nonprofessional occupations do not differ in classroom 
behavior as measured by the ratings. 

4. In answer to a statement pertaining to the effectiveness of their Ph.D. 
program at a certain university, a sample of graduates who had varying educa- 
tional experience in secondary schools prior to beginning the program, responded 
as follows: 


ee 


SECONDARY SCHOOL EXPERIENCE 


TEACHING 
AND 
ADMINIS- ADMINIS- 

RESPONSE TEACHING TRATIVE TRATIVE NONE TOTAL 
Strongly Agree 9 T 7 1 24 
Fe 2 9 8 1 30 
Undecided 6 2 4 5 17 
Disagree 5 2 3 4 14 
Strongly Disagree 1 1 2 4 8 
Total 33 21 24 15 93 


| 


ANALYSIS OF VARIANCE—SINGLE CLASSIFICATION 187 


Test the hypothesis that the response to the item was not influenced by the 
nature of the secondary school experience of the graduates. Interpret your answer. 
5. A psychologist has undertaken an extensive investigation of the behavior 
characteristic “cautiousness.” Cautiousness scores of a random sample of 200 
adolescent boys and 200 adolescent girls are summarized in the following table: 


BOYS GIRLS TOTAL 
N 200 200 400 
2X 3,407 4,968 8,475 
ZY: 201,471 


Test the hypothesis that adolescent boys and girls do not differ in cautiousness 
as measured by the test used. Interpret your answer. 

6. Socio-economic ratings were determined for a sample of male college stu- 
dents living in fraternities, dormitories, and private rooming houses. A high 
rating was indicative of high socio-economic level. The data are summarized 
below: 


E, 
d 


ROOMING 
SYMBOL FRATERNITY DORMITORY HOUSE TOTAL 
N 30 30 30 90 
2X 261 198 212 671 
EX! Ë 5,362 


Test the hypothesis that college students living in the various types of housing 
do not differ in regard to socio-economic status as measured by the rating scale. 


Interpret your answer. 


11 


Analysis of Variance— Multiple 


Classification 


In the preceding chapter pertaining to the analysis of variance, the 
treatment of data divided on the basis of one classification was discussed. 
However, when designing experiments in educational and psychological 
research, the possibility of classifying the data in more than one manner 
is invariably considered. Indeed, it is often not only possible but also 
advisable to design studies so that the results permit the testing of 
hypotheses concerning separate subdivisions of the data. It is in keeping 
_with efficient experimental methods to incorporate logical multiple classifi- 
cation in such research problems. 

In addition, the subdivision of the data into two or more classifications 
allows the investigator to control certain characteristics known to in- 
fluence the result of the experiment, or perhaps only suspected of such 
influences. In this manner possible sources of bias can be controlled and 
the demands of sound experimental design for experiments free of bias 
can be met, at least in part, by meaningful classification of the data. 
Such classification also sensitizes the test of significance by enabling the 
investigator to identify more of the sources of variation in his inves- 
tigation; thus the mean square for the within source of variation is 


TABLE 63. Miles per Gallon of Gasoline with Five Makes of Automobiles 


MAKE 

MODEL J u ul IV d TOTAL 
Current 26 22 22. 24 y 18 112 
1 Year Old 24 21 20 20 20 105 
2 Years Old 22 18 19 19 16 94 
3 Years Old 20 15 17 13 15 80 
4 Years Old 14 12 11 18 12 67 
106 88 89 94 81 458 


Total 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION 189 


correspondingly reduced. This two-fold advantage of the multiple classi- 
fication, along with the computational method of the analysis of variance 
with multiple classification, can be illustrated by the following example. 

An automotive engineer wished to study the gasoline mileage of five 
different makes of automobiles. Under carefully controlled driving condi- 
tions, five models of each make varying from current to four years of age 
were tested. The gasoline mileage is shown in Table 63. 

With the application of the analysis of variance with single classifica- 
tion the null hypothesis that there is no difference among the five makes 
of automobiles with regard to gasoline consumption can be tested. 

Accordingly, for the single classification on the basis of make, 


2 
S.S. for total = (26)? + port + (19): — SB = 308.44 
( 2 2 2 9. 2 ( JA ( 2 
S.S. for makes = 106) ABEE EE AE 2) ER EE 
5 5 5 5 5 25 
= 69.04 


S.S. for within = (393.44 — 69.04) = 324.40 
The analysis of variance is completed in Table 64. 
TABLE 64. Analysis of Variance of Gasoline Mileage 


u by, Make of Automobile Only 
E eh EE E 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Makes 4 69.04 17.26 
Within 20 324.40 16.22 
Total 24 393.44 

17.26 
Fin = 16.22 — 1.06 


The F-value of 1.06 with 4 and 20 degrees of freedom is nonsignificant. 
Therefore it has not been possible to disprove the null hypothesis. 

However, a more refined test could be made. An examination of the 
data reveals that the individual differences among automobiles vary 
with the age of the automobile. Hence classification of the data on the 
basis of model seems appropriate. For this purpose an extension of the 
Single-classification analysis is necessary. 

The data in Table 63 can be used to determine a sum of squares due 
to model as well as that due to make of automobile. For the double 
classification of make and model, ` 


2 
S.S. for total = (26): + (24): + --- + (12)2 — SË — 393.44 


190 STATISTICAL METHODS 
2 2 2 412 2 2 
e — aw i e a ey ER La + En = 155) = 69.04 


8.5. for modes = (2E (009 , (00%, 60%, Er say 


= 268.76 
SS. for within = 393.44 — (69.04 + 268.76) = 55.64 


It is well to note that, when calculating the sum of squares for makes 
and for models, the sum of the numerators of the fractions to be added, 
prior to squaring, equals the numerator of the correction term prior to 
squaring, and the sum of the denominators equals the denominator of the 
correction term. This is a helpful rule to be remembered when computing 
the sums of squares for either single- or multiple-classification analyses 
of variance. 

The analysis of variance for the double classification of make and 
model is shown in Table 65. From this double classification it is possible 
to test two hypotheses. The original null hypothesis that there is no dif- 
ference among the five makes regarding gasoline mileage can be tested, 


TABLE 65. Analysis of Variance of Gasoline M ileage 
by Make and Model of Automobile 


be 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Makes 4 69.04 17.26 
Models 4 268.76 67.19 
Within 16 55.64 ` 3.48 

Total 24 393.44 
17.26 67.19 
For make, Bu = Kecge 4.96 For model, Fig = 348 7 19.32 


along with a second null hypothesis that there is no difference among 


| models of automobiles regarding gasoline mileage. It will be recalled that ` 


the first null hypothesis could not be disproven with the original single- 
classification analysis shown in Table 64. However, when the analysis of 
variance for both make and model was computed as shown in Table 65, 
the F-value for makes of 4.96 with 4 and 16 degrees of freedom is signifi- 
cant. The null hypothesis is no longer tenable. The makes of automobiles’ 
differ in rate of gasoline consumption. ' 

It can be seen that the classification has provided a more Sensitive test 
of significance since much of the variation among automobiles could be 
removed by the additional classification according to model. In the fore- 
going case the increase in sensitivity is evidenced by the reduction in the 
denominator of the F-value for makes from 16.22 in Table 64 to 3.48 in 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION 191 


Table 65, whereas the numerator remained constant. Additional classifi- 
cation is usually accompanied by greater sensitivity in the test of sig- 
nificance, the increase in sensitivity depending on the meaningfulness of 
the classification. 
Multiple classification has increased the amount of information which 
can be obtained from the data by providing a test for a second null 
\ hypothesis already mentioned. This hypothesis is concerned only with the 
differences among models ignoring the makes, just as the original null 
hypothesis was concerned only with the differences among the makes, 
ignoring the models. The F-value of 19.32 for models is significant. The 
null hypothesis is untenable, the models differing in gasoline mileage. 


TABLE 66. Gasoline Mileage for 100 Cars of 5 Makes and 5 Models 


MAKE 


MODEL I II u Iv y TOTAL 

Current 26 22 22 24 18 

25 23 21 22 20 

25 20 22 20 19 

26 21 21 21 18 
Subtotal 102 86 86 87 75 436 
1 Year Old 24 21 20 20 20 

25 18 20 20 18 

23 20 18 18 17 

22 19 19 19 18 
Subtotal 94 78 77 77 73 399 
2 Years Old 22 18 19 19 16 

22 20 18 20 18 

20 16 17 18 16 

19 17 15 18 we 17 
Pee INS A. O eee 
Subtotal 83 71 69 75 67 365 
3 Years Old 20 15 17 13 15 

18 15 16 18 16 

18 14 16 17 17 

17 13 15 18 17 
a l,l A ARI oo S 
Subtotal 73 57 64 66 65 325 
4 Years Old 14 12 11 18 12 

16 10 15 18 15 

16 14 12 14 17 

15 13 13 19 17 
Subtotal 61 49 51 69 61 291 


Total 413 341 347 374 341 1816 


192 STATISTICAL METHODS 


The automotive engineer could redesign the study to obtain more infor- 
mation than is available from the foregoing analysis by including more 
than one automobile of each make and model. Hence three additional 
automobiles of each make and model were test driven and the gasoline 
mileages for these and for the first 25 are shown in Table 66. 

Again it is possible to test a null hypothesis concerning the models and 
one concerning the makes. However, a new source of variation can be 
isolated. By having more than one automobile of each model and make, 
the interaction between make and model can be tested. The interaction 
of make and model represents the discrepancy that appears because the 
makes and models do not present the same gasoline mileage throughout 
the data. Thus, if one make has a constant mileage throughout the dif- 
ferent models, a second make has an improving gasoline mileage, and 
still a third has a decreasing gasoline mileage, then the sum of squares due 


TABLE 67. Mean Gasoline Mileages of Five Makes and Five Models 
of Automobiles 


MAKE 


MODEL I II mt Iv di TOTAL 
Current 25.50 21.50 21.50 21.75 18.75 21.80 
1 Year Old 23.50 19.50 19.25 19.25 18.25 19.95 
2 Years Old 20.75 17.75 17.25 18.75 16.75 18.25 
3 Years Old 18.25 14.25 16.00 16.50 16.25 16.25 
4 Years Old 15.25 12.25 12.75 17.25 15.25 14.55 
Total 20.65 17.05 17.35 18.70 17.05 18.16 


to interaction will be large. Interaction is that lack of uniformity of 
performance found among the models of the various makes. 

The mean gasoline mileages for makes and models are shown in Table 
67. For all 100 automobiles, the mean gasoline mileage is 18.16. The sum 
of squares for makes represents those differences among the border means 
for make, i.e., 20.65, 17.05, 17.35, 18.70, and 17.05, whereas the sum of 
squares for model represents those differences among the border means for 
model, i.e., 21.80, 19.95, 18.25, 16.25 and 14.55. 

The interaction present can be identified by noting the table entries 
after all border means for makes and models have been adjusted to a 
gasoline consumption of 18.16 miles per gallon. In the case of the current 
model, the border mean of 21.80 differs from the general mean by the 
amount: 18.16 — 21.80 or — 3.64. To make the border mean equal the 
general mean, this value must be applied according to sign to the mean 
gasoline mileages of the current model of each make, i.e., 25.50, 21.50, 
21.50, 21.75, and 18.75. 

Thus the adjustments for the current model are as follows: 


+> 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION 193 


Make I 25.50 — 3.64 = 21.86 
Make II 21.50 — 3.64 = 17.86 
Make IIT 21.50 — 3.64 = 17.86 
Make IV 21.75 — 3.64 = 18.11 
Make V 18.75 — 3.64 = 15.11 


This procedure is repeated for all other models and the adjustment means 
are listed in Table 68. 


TABLE 68. Gasoline Mileage Means after First Adjustment 


MAKE 

MODEL 1 u ui IV T TOTAL 
Current 21.86 17.86 17.86 18.11 15.11 18.16 
1 Year Old 21.71 17.71 17.46 17.46 16.46 18.16 
2 Years Old 20.66 17.66 17.16 18.66 16.66 18.16 
3 Years Old 20.16 16.16 17.91 18.41 18.16 18.16 
4 Years Old 18.86 15.86 16.36 20.86 18.86 18.16 
Total 20.65 17.05 17.35 18.70 17.05 18.16 


It is readily apparent that the adjustment of the model border means 
to equal the general mean has not changed the make border means. 

To adjust the make border means so that each is equal to the general 
mean, the method used to adjust the model means is repeated. An adjust- 
ment term is found by subtracting the border mean from the general 
mean and is applied according to sign to the appropriate table entries. 
The results are shown in Table 69. 


Taste 69. Gasoline Mileage Means after Second Adjustment 
SE 


MAKE 

MODEL N u ni IV v TOTAL 
Current 19.37 18.97 18.67 17.57 16.22 18.16 
1 Year Old 19.22 18.82 18.27 16.92 17.57 18.16 
2 Years Old 18.17 18.77 17.97 18.12 17.77 18.16 
3 Years Old 17.67 17.27 18.72 17.87 19.27 18.16 
4 Years Old 16.37 16.97 17.17 20.32 19.97 18.16 
Total 18.16 18.16 18.16 18.16 18.16 18.16 


As a result of the two adjustments of the means, all border means are 
equal to the general mean. If the sum of squares for makes and for models 
were to be computed these sums would be found to be zero. The remain- 
ing differences among the 25 means in Table 69 represent the interaction 
between make and model, The sum of squares for the subgroups is now 


194 STATISTICAL METHODS 


equal to the sum of squares for interaction. The sum of squares for inter- 
action can be computed from the means in Table 69 by applying the 
formula, 


SS. = HL X2 + +++ + Xb) — N (Xita)? | 
Substituting, 
SS. = 4[(19.37)? + (19.22)? + --- + (19.97)2) — (100) (18.16)? 
= 106.86 


If no interaction existed, all of the 25 means would equal the general 
mean and the sum of squares for interaction would be zero. Such an 
occurrence is extremely unusual. The information of interest to the 
investigator is the likelihood that the differences that do exist among 
the means in the body of the table could have resulted from an accident 
of sampling. 

The foregoing adjustment of the means is not an essential part of the 
calculations of the analysis of variance with multiple classification. It is 
only a device to identify more directly the meaning of interaction. The 
calculations of the analysis of variance are made from the data obtained 
in Table 66. Generally the sums of squares are found in a similar manner 
as those in Table 65. 

The sum of squares for total is found as usual: 


SS. = EX? — SË = (26)? + (25)? + (25)? + --- + (17)? — oe 
= 1111.44 


The sum of squares for make is: 


= LX), EX, Gr, EX), EX)? X} 
DE e EE NEE e ee 


(413)? , (841)? , (847)? , (874) , (341)? _ (1816)? ` 
20 Ta t 20 t 20 + ~ 10 = 192.24 


The sum of squares for model is: 


1 Since the sum of squares for subgroups is usually computed from the formula 


SEK, O PË” Xun) 
SS. Pa + E, + + a N 


E 2 
and since X, = 2X), Xi = E)”, ete, and kX? = exy Ela 
kı k? A 
then upon substitution, D 
SS. = (ky HE +++ + kosX2s) — N (X otai)? 


TË ky = ka = +t = hos, aa ES A 
then SS. = k(X? + X3 + +++ + X35) — N (Xota)? 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION 195 
(2X1)? , QX), QX), QX): , CX)” (EN)? 
SS. = = E 
ky + ko E ka të ka E ks N 


_ (436)? , (399: , (865) , (825)? , (291)? _ (1810): 
SC E te, dr 297 aoe eee 


'The sum of squares for interaction is: 
[EX (BX)? 1 CZ _ (2X)? 
dë Sa SË an NI 
— (S.S. for make + S.S. for Model) 


_ raoz: , (94)? , (607. 48197. 
=| +o bet ml (192.24 + 662.84) 


ll 


106.86 

The sum of squares for interaction is determined by first squaring the 
total of the mileages for the four automobiles of the same make and 
model for each of the 25 subtotals. These squared totals are then divided 
by four since each total consists of four mileages. The familiar correction 
term is subtracted and the resulting sum of squares is that for make, 
model, and interaction. To find the sum of squares for interaction alone, 
the previously computed sums of squares for make and for model are 
subtracted as indicated. It should be noted that the value of the sum of 
squares found by this conventional method is identical with that resulting 
from computations based upon the means in Table 69. 

After these three sources of variation have been found, the part of the 
total sum of squares still remaining represents the individual differences 
among automobiles of the same make and model. The value can be most 
readily obtained by subtraction. 

Thus, sum of squares for within is: 


S.S. = S.S. for total — (S.S. for make + S.S. for model + S.S. for 
interaction) 
1111.44 — (192.24 + 662.84 + 106.86) = 149.50 


The within sum of squares can also be computed by subtracting the 
sum of squares of the subgroups from the sum of squares for total. When 
simplified, the equation becomes: 


` ien E C o GI 
S.S. = EX? ee + Gy E 


= (26)? + (25)? + (25)? + ==> + (17)? 
EA EA SË +42] = 149.50 


The degrees of freedom for the main effects of make and model are 
four, one less than the number of categories within the classification. The 


196 STATISTICAL METHODS 


number of degrees of freedom for interaction is the product of the degrees 
of freedom of the main effects included in the interaction. In the case of 
the interaction, the degrees of freedom are (4) (4) or 16, The unexplained 
degrees of freedom are 99 — (4 + 4+ 16) = 75. The complete analysis 
of variance is shown in Table 70. 


TABLE 70. Analysis of Variance of Gasoline M tleages 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Make 4 192.24 48.06 
Model 4 662.84 165.71 
Interaction 16 106.86 6.68 
Mithin 75 149.50 1.99 
Total 99 1111.44 
48.06 _ 
For make, Fins = ee 24.1 
165.71 
= 2 - 83.2 
For model, Feos 1.99 83.27 
For interaction, Dua = Ee 3.36 


1.99 


The F-values are found in a systematic manner. In the case of all 
three of the hypotheses to be tested, the mean square for within subgroups 
is used as the denominator when computing the F-values.1 

All of the F-values are significant. Therefore, in the case of the two 
main effects, evidence once again has been found that the various makes 
differ in regard to gasoline mileage, and so do the various models. The 
significant F-values for interaction 'can be interpreted to mean that there 
was little uniformity of gasoline mileages found among models of various 
makes. In other words, models of the various makes differed in the num- 
ber of miles traveled per gallon of gasoline. 

Inspection of the 25 subgroup means in Table 67 suggests even more 
specifically the lack of uniformity which caused the significant inter- 


action. Although the mileages of all makes tended to decrease as the 


models became less recent, it is apparent that the mileages of makes I, 
II, and III decreased at a more rapid rate than did the mileages of 


3 D should be noted that various statisticians have expressed differences of opinion 
concerning the selection of the proper denominator for the F-equation in a multiple 
classification analysis of variance. Detailed information concerning two additional 
methods of computing and interpreting F-values in such an analysis can be found in 


the following references. ; 
G. W. Snedecor, Statistical Methods, 4th ed. (Ames, Iowa, Iowa State College 


1946), Chap. 11. 
E Johnson, Statistical Methods in Research (New York, Prentice-Hall, Inc., 


1949), Chap. 13. 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION 197 


makes IV and V. This very evident failure of the models of the various 
makes to maintain consistent mileages exemplifies an interaction yielding 
a significant F-value. 

In the original analysis a single automobile of each model and make 
was considered. Since investigations involving a single individual in a 
subgroup are dangerously inadequate because of the wide variability which 
exists among individuals, they are seldom, if ever, undertaken in educa- 
tional and psychological research. Satisfactory research dealing with in- 
dividuals requires a sufficient number of individuals in each subgroup 
so that an appropriate estimate can be had of the individual differences 
present. 

In the example shown in Table 71 there were ten boys and ten girls, 


TABLE 71. Sums of Home Adjustment Scores for All Subelcsses 


CLASS LEVEL 


SEX FRESHMAN SOPHOMORE JUNIOR SENIOR TOTAL 
Boys 110 100 120 130 460 
Girls 110 120 125 135 490 
Total 220 220 245 265 950 


EX? = 11,877 


none of whom were blood relatives, in each of the freshman, sophomore, 
junior, and senior class, a total of eighty pupils to whom a home adjust- 
ment scale was administered. The higher the home adjustment score, the 
more satisfactory was the home adjustment. 

The null hypotheses which may be tested are first, there is no difference 
among class levels in home adjustment; second, high school boys and girls 
do not differ in their home adjustment; and third, the sexes do not react 
differently in home adjustment among different high school class levels. 

Sum of Squares for Total: Each pupil’s test score was squared. The 
squares were summed for all 80 pupils. From this sum a correction term 
was subtracted. This correction term was found by summing the scores 


“for all pupils and squaring this sum. The sum was then divided by 80, 


the number of pupils. The result was 595.75. 


2X) 
General Formula: S.S.: = =X? — ( N 


2 
Substitution: SS. = 11,877 — oe = 595.75 


Sum of Squares for Sex: The sum of the scores was found for each sex, 
disregarding class level. The sum for boys was squared and divided by 
the number of boys. The sum for girls was squared and divided by the 


198 STATISTICAL METHODS 


number of girls. These quotients were summed and the correction term 
subtracted. The result was 11.25. 


2 2 2 
General Formula: S.S., = Ex) + DZ GEI 


D ko N 
y 2 2 5012 
Substitution: S.S., = 22, (290% _ COO" — 11.25 


Sum of Squares for Class Level: The sum of scores for each class 
level vvas found, disregarding the sex. The sum for each class level vras 
squared and divided by the number of students in that class level. The 
quotients were summed and the correction term subtracted. The result 
was 71.25. 


2 2 $2 2 
General Formula: S.S., = E + E + ur + EL = EH 
2 2 2 2 
Substitution: S.S., = e + er m ca En er ke gso: 


= 71.25 


Sum of Squares for Subgroups: The sum of the scores for each of the 
sexes at each class level was squared and divided by the number of 
students in the subgroup. The quotients were summed and the correction 
term subtracted, yielding 93.75. This value represented the sum of squares 
for sex, class level, and interaction. To obtain the sum of squares for 
interaction alone, the sums of squares for class level and for sex were 
subtracted from 93.75, yielding 11.25. 


General Formula: S.S. = Zo + ae SE If E 
DH Dr. Dër 


N 
Substitution: S.S. = aor + or ÄM (185) o e 


= 93.75 


Sum of Squares for Interaction = 93.75 — (11.25 + 71.25) = 11.25. 
Sum of Squares for Within: This represents the part of the total sum of 
squares not accounted for by sex, class level, or interaction, or 


S.S. = 595.75 — (11.25 + 71.25 + 11.25) = 502.00 


Degrees of Freedom: The number of degrees of freedom for total is 
79—one less than the total number of pupils; for sex it is one—one less 
than the number of sexes; for class level it is three—one less than the 
number of class levels; for interaction it is three—the product of the 
number of degrees of freedom for sex and class level; and for within 
it is the difference between that of the total and those of sex, class level, 
and interaction combined. 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION 199 


Mean Squares and F-Values: The mean squares were computed in 
the usual manner by dividing the sum of squares by the corresponding 
number of degrees of freedom. The denominator for all F-values to be 
computed is the within subgroups mean square. The complete analysis of 
variance is shown in Table 72. 


'TABLE 72. Analysis of Variance of Home Adjustment Scores 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Sex 1 11.25 11.25 
Class Level 8 71.25 23.75 
Interaction 3 11.25 3.75 
Mithin 72 502.00 6.97 
Total 79 595.75 
11.25 
=—— z 1.61 
For sex, Pin 6.97 
y S 23.75 
4 = —— = 3,41 
For class level, F3,72 6.97 3 
For interaction, Fa: = an = 0.54 


Only the F-value for class level is significant. 

Interpretation: Although the girls rated higher than the boys in home 
adjustment as shown in Table 71, the difference was not significant and 
hence it has not been possible to refute the first null hypothesis. Evidence 
was found to refute the second null hypothesis since an increase in home 
adjustment score throughout the class levels was noted in the sample 
which was no doubt a population characteristic. The third null hypothesis 
was not disproven, since no satisfactory evidence was produced to indicate 

hat the sexes react differently in home adjustment among different 


high school class levels. 


TRIPLE CLASSIFICATION 


In the foregoing description, one school only was included, If the study 
had included, for example, five schools, there might be school differences 
as a source of variation also. The sources of variation and the numbers of 
degrees of freedom in such a case are shown in Table 73. The methods of 
computing sums of squares, the assigning of correct number of degrees of 
freedom and the appropriate interpretations can be readily inferred. 

Total: The sum of squares for total is found as usual—the sum of the 
squares of each of the scores of the 400 pupils minus the familiar cor- 
rection term. The number of degrees of freedom is 399, one less than the 


number of pupils. 


200 STATISTICAL METHODS 


Schools: The sum of squares for schools is found in the usual manner 
by disregarding the classification of sex and class level. The number of 
degrees of freedom is four, one less than the number of schools. 

Sex: The sum of squares for sex is found by disregarding school and 
class level. The number of degrees of freedom is one, one less than the 
number of sexes. 

Class Level: The sum of squares for class level is found by disregarding 
sex and school. The number of degrees of freedom is three, one less 
than the number of class levels. 


TABLE 73. Analysis of Variance of Home Adjustment 
Scores—Five High Schools 


DEGREES OF SUM OF MEAN 


SOURCE OF VARIATION FREEDOM SQUARES SQUARE 
School 4 
Class Level 3 
Sex 1 
School X Sext 4 
School X Class Level 12 
Sex X Class Level 3 
School X Sex X Class Level 124 
Within Subgroups 360 
Total 399 


+ The designation “school X sex” may be expressed as “school by sex.” 


Sex X School Interaction: The sex by school interaction, being the in- 
teraction between two main effects, is known as the first-order interaction. 
The sum of squares for sex by school interaction is found by squaring 
the sum of the home adjustment scores for each sex in each school. 
These squares are summed and divided by 40, the number of pupils of 
each sex in any given school. From this quotient, the correction term is 
subtracted. The value obtained represents the sum of squares for inter- 
action of sex and school, together with those for the main effects, sex and 
school which have been previously determined. The sum of squares for 
interaction may be found by subtracting the sums of squares for sex and 
schools. The number of degrees of freedom is four, the product of the 
number of degrees of freedom for sex and for schools. 

School X Class Level: The sum of squares for school by class-level 
interaction is found by squaring the sum of the home adjustment scores 
for each class level in each school disregarding sex. The squares are 
summed and divided by 20, the number of pupils in each class level in 
each school. From this quotient, the correction term is subtracted. The 
value obtained represents the sum of squares for interaction together 
with those for the main effects of school and of class level. The interaction 


| 
Í 
| 


á 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION 201 


sum of squares may be found by subtracting the sums of squares for the 
main effects, school and class level. The number of degrees of freedom 
for interaction is 12, the product of the number of degrees of freedom 
for school and class level. 

Sex x Class-Level Interaction: The sum of squares for sex by class- 
level interaction is found by squaring the sum of the home adjustment 
scores for each sex at each class level but disregarding schools. These 
squares are summed and divided by 50, the number of pupils represented 
by each sum. From this quotient, the correction term is subtracted. The 
value obtained represents the sum of squares for interaction together 
with those for the main effects, sex and class level. The interaction is 
found by subtracting the sum of squares for the main effects. The num- 
ber of degrees of freedom is three, the product of the number of degrees 
of freedom for sex and for class level. 

School X Sex X Class-Level Interaction: The school by sex by class- 
level interaction, being the interaction between three main effects, is 
known as a second-order interaction. The sum of squares for school X 
sex X class-level interaction is found by squaring the sum of the home 
adjustment scores for each sex at each class level in each school. These 
squares are summed and divided by 10, the number of pupils represented 
in each sum. From this quotient, the correction term is subtracted. The 
value obtained represents the sum of squares for school X sex X class- 
level interaction together with those for the three main effects and the 
three first-order interactions. The desired second-order interaction is 
found by subtracting the six last sums of squares. The number of degrees 
of freedom is 12, the product of the number of degrees of freedom for the 
three main effects. 

Within Subgroups: The sum of squares for within subgroups can be 
obtained most readily by subtracting from the sum of squares for total, 
those for the three main effects, the three first-order interactions and 
the second-order interaction. The number of degrees of freedom likewise 
can be obtained by subtraction. 

Mean Squares and F-Values: As in all previous examples, the mean 
squares are computed by dividing the sum of squares by the corresponding 
degrees of freedom. All F-values are found by dividing each mean square 
for any main effect or interaction by the mean square for within sub- 
groups. Interpretations are made in the usual manner. 


APPLICATION OF ANALYSIS OF VARIANCE 
MULTIPLE CLASSIFICATION TO DESCRIPTIVE 
DATA 


When data have been expressed in descriptive units on a continuum, 
arbitrary code numbers can be assigned after which little difficulty is 
encountered in the analysis of variance in multiple classification. 


202 STATISTICAL METHODS 


In Table 74 data are shown of church attendance of 160 college stu- 
dents when classified by sex and father’s occupation. Code numbers of 4, 
3, 2, 1, and 0 are assigned to regularly, usually, occasionally, seldom, and 
never, respectively. 


TABLE 74. Regularity of Church Attendance of College Students 
Se 


OCCUPATION OF FATHER 
PROFES- MER- 


SEX ATTENDANCE CODE SIONAL CHANT FARMER LABORER TOTAL 
Regularly 4 7 6 3 3 19 
Usually 3 5 6 7 3 21 
Male Occasionally 2 3 2 4 8 ke 
Seldom il 2 1 2 2 7 
Never D 3 5 4 4 16 
Regularly 4 8 7 5 5 25 
Usually 3 5 if 8 6 26 
Female Occasionally 2 3 3 4 të 17 
Seldom 1 3 2 3 1 9 
Never 0 1 1 0 1 3 
Total 40 40 40 40 160 


Sum of Squares for total: 
S.S. = (19 + 25)(4)? + (21 +26)(3)? + (17 + 17)(2)2 + (7 + 9)(1)? + (16 + 3)(0)* 
— (G9 + 25)(4) + (21 + 26) (3) + (17 + 17) (2) + (7 + 9)(1) + (16 + 3)(0))2 
160 


= 1279 — 1005 = 274 


Sum of Squares for sex: 


SS. = 19(4) + 21(3) +17(2) +7(1) +16(0)2 + 25(4) +26(8) + 17(2) + 9(1) + 3(0))2 
WS. D 


— Correction Term = 1015.5 — 1005 = 10.5 


Sum of Squares for occupation: 


s.s. TËS) + (5 + 5)6) + (3 Se + (2 +3)(1) + (3 + Dm pe 


+ (8 +5)(4) + (3 + 6)(3) + (8+7)2) + 2 +101) + 44 1) (0)? 
40 


— Correction Term = 1008.3 — 1005 = 3.3 


The sum of squares for interaction can be readily found if the number 
of cases in each cell are first multiplied by the appropriate code number 
and summed as shown in Table 75. 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION 203 


TABLE 75. Coded Sums for Regularity of Church Attendance 
of College Students 
Sg 
OCCUPATION OF FATHER 
PROFES- MER- 


SEX ATTENDANCE CODE SIONAL CHANT FARMER LABORER TOTAL 
e ee E 
Regularly 4 28 24 12 12 76 
Usually 3 15 18 21 9 63 
Male Occasionally 2 6 4 8 16 34 
Seldom 1 2 1 2 2 7 
Never 0 0 0 0 0 0 
Subtotal 51 47 43 39 180 
Regularly 4 32 28 20 20 100 
Usually 3 15 21 24 18 78 
Female Occasionally 2 6 6 8 14 34 
Seldom 1 3 2 3 1 9 
Never 0 0 0 0 0 0 
Subtotal 56 57 55 53 221 
Total 107 104 98 92 401 


Sum of Squares for interaction: 


SS. = EZ + (56)? + (47)? + (57)? + (43)? + (55)? + (39) + (53)* 
Cie 20 


— Correction Term | — (10.5 + 3.3) = 1.15 


Sum of Squares for within: 
S.S. = 274 — (10.5 + 3.3 + 1.15) = 259.05 
The final analysis of variance is shown in Table 76. 


TABLE 76. Analysis of Variance of Church Attendance Ratings 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Sex 1 10.50 10.50 
Occupation 3 3.30 1.10 
Interaction 3 1.15 0.38 
Within 152 259.05 1.70 
Total 159 274.00 
10.50 
SAS 
For sex Fu 170 8 
> 1.10 
For occupation, Fous = 170” 0.65 
0.38 


For interaction, Paus = 170” 0.22 


204 STATISTICAL METHODS 


Only the church attendance with regard to sex yielded a significant F- 
value. Therefore, evidence has been found that female college students 
attend church more regularly than do male college students. 


PAIRING OF CASES 


There are instances in research in the social sciences in which the 
analysis of variance can be made a more effective tool by a careful and 
reasonable pairing of the cases involved in the experiment. For example, a 
designated number of individuals is drawn at random for an experimental 
group. The control group is formed by selecting individuals who can be 
matched with the experimental cases on the basis of variables which need 
to be controlled. In this manner any difference between responses of the 
two groups to the criterion will most likely not be attributable to dif- 
ferences between the two groups in respect to the variables used in the 
matching. 

At one time the method of pairing of cases was more or less extensively 
used. It represented a positive attempt to remove individual differences 
which might otherwise have been ignored in the investigation, For ex- 
ample, if athletes were to be compared to nonathletes with respect to 
their over-all achievement in college, they might be paired on the basis 
of such measurements as scholastic aptitude, reading ability, high school 
grade-point averages, and curriculum. If this pairing were highly effective 
it would remove to some extent those differences in college achievement 
caused by individual differences other than athletic status. Thus those 
differences in achievement found between the members of each pair may 
be more readily designated as the result of athletic status. 

Certainly the process of pairing would be a tedious and painstaking 
task. Great difficulty would be encountered in finding suitable pairs for 
some members of the experimental group. Also, as the number of char- 
acteristics increases, this difficulty of finding pairs will increase until it 
becomes virtually impossible to find a truë matching pair. The compro- 
mises in selection which inevitably result correspondingly decrease the 
effectiveness of the pairing. The use of pairing in recent years has de- 
clined because of the foregoing undesirable features and the fact that 
control of individual differences which are on a continuum can be more 
easily and more accurately obtained by the analysis of covariance. 

The method of pairing need not necessarily involve two matched per- 
sons as in the case of the previous example of athletes and nonathletes. 
Some of the pairing possibilities are frequently concerned with only one 
person, as a pretest and posttest, first trial and second trial, or one ex- 
perimental condition vs. a second experimental condition. In such experi- 
ments, a specific aspect of each individual is measured twice and the 
individual is, in a sense, in competition with himself. The control for 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION 205 


purposes of statistical analysis is excellent. The t-test described in an 
earlier chapter can be readily applied to data of this type. 

The treatment of cases paired in an effective manner can be illustrated 
in the following example. An investigator is interested in comparing the 
intelligence of brothers and sisters. Since similarity in age and environ- 
mental background were considered as desirable controls, a random sam- 
ple of ten-year-old fraternal twins of dissimilar sex who were raised 
together was selected. The hypothesis to be tested was that there is no 
difference in intelligence between male and female members of fraternal 
twins. The intelligence quotients of twenty twins are listed in Table 77. 


TABLE 77. Intelligence Quotients of Fraternal Twins 


TWIN MALE FEMALE TOTAL 
1 117 121 238 
2 101 96 197 
3 124 119 243 
4 94 99 193 
5 89 96 185 
6 103 112 215 
Gë 117 120 237 
8 100 105 205 
9 86 74 160 

10 112 118 230 

11 104 110 214 

12 95 99 194 

13 134 129 263 

14 118 126 244 

15 117 130 247 

16 105 100 205 

17 99 89 188 

18 84 94 178 

19 71 80 151 

20 86 93 179 

Total 2056 2110 4166 


EX? = 443,518 


The analysis of variance with single classification was computed in 
the usual manner. 


A 2 
S.S. for total = 443,518 — LLL — 9629.1 


_ (2056)? , (2110)? _ (4166)? _ 
S.S. for sex = 20 +0 40 72.9 
S.S. for within = 9629.1 — 72.9 = 9556.2 


The individual intelligence quotient scores were used as the criterion. 
Since the t-value is nonsignificant the null hypothesis in tenable. How- 
ever, the pairing of cases has not been considered in the computation. 


206 STATISTICAL METHODS 


If the pairing has been successful in controlling individual differences 
other than sex which could possibly affect the criterion, a considerable 
part of the within sum of squares can be attributed to this control. 


TABLE 78. Analysis of Variance of Intelligence Quotients 


SOURCE OF DEGREES OF SUM OF MEAN 

VARIATION FREEDOM SQUARES SQUARE 
Sex 1 72.9 72.9 

Within 38 9556.2 251.5 

Total 39 9629.1 

72.9 
= — = 0.2! 
TEA E 


t = VF = 0.538 (with 38 degrees of freedom) 


To include the pairing in the analysis, a new source of variation en- 
titled “twins” is added. The degrees of freedom will be one less than the 
number of twins, or 19. To compute the sum of squares due to twins, the 
intelligence quotients scores for each set of twins are summed and the sum 
is squared. The resulting squares for the twenty twins are added and this 
sum divided by two. The usual correction term is subtracted. 


(2X1)? + (2X2)? + +++ + (Xa) CA) 

2 N 
_ (288)? + (197)? + --- + (179)? _ (4166)? 
3 2 40 


S.S. for twins = 


= 9079.1 
The sum of squares for sex remains the same whereas the sum of squares 
for within becomes 

9629.1 — (72.9 + 9079.1) = 477.1 


The complete analysis of variance is shown in Table 79. 


TABLE 79. Analysis of Variance of Intelligence 
Quotients of Fraternal Twins 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Sex 1 72.9 72.9 
Twins 19 9079.1 477.85 
Within 19 477.1 25.11 
Total 39 9629.1 


72.9 a _ 477.85 
25.11 =2.9 For twins, usa 25.11 


t = VF = 1.70 (with 19 degrees of freedom) 


For sex, Fias = = 19.03 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION .207 


The t-value for sex is nonsignificant. There is little evidence that the 
male and female members of fraternal twins differ in intelligence. Re- 
member that the t-value of 1.70 could also be obtained by applying the 
t-test mentioned in a previous chapter. 

It is well to note that the test in Table 79 is more sensitive than the 
test in Table 78 as evidenced by the decrease in the within mean square 
from 251.5 to 25.11. Furthermore, the interpretation of the t-value com- 
puted by the method used in Table 79 is much more meaningful than 
that of the t-value of Table 78 because many pertinent individual dif- 
ferences have been controlled. 

The F-value of 19.03 with 19 and 19 degrees of freedom for twins is 
significant. Therefore, the pairing has been effective. If this F-value were 
nonsignificant, the pairing on the basis of twins would have been no 
better than pairing the boys and girls at random. 


Exercises 


1. In a certain city, a sample of 900 men between the ages of 30 and 35, who 
had not identified themselves with any national political party, was asked to 
EE 

EDUCATION STATUS 
HIGH SCHOOL, 


VOTING LESS THAN LESS THAN MORE THAN 

STATUS RATING HIGH SCHOOL COLLEGE HIGH SCHOOL TOTAL 

Yes 1 4 2 1 7 

2 7 4 5 16 

3 16 6 6 28 

4 31 24 19 74 

5 65 54 48 167 

6 19 31 33 83 

7 4 15 20 39 

8 1 9 12 22 

9 2 1 5 8 

10 1 4 1 6 

Subtotal 150 150 150 450 

No 1 16 10 3 29 

2 14 9 9 32 

3 41 29 11 81 

4 59 24 18 101 

5 12 56 53 121 

6 3 14 39 56 

7 2 4 5 11 

8 2 1 4 7 

9 1 2 4 7 

10 0 1 4 5 
A Imm 

Subtotal 150 150 150 450 

300 300 300 900 


Total 


208 STATISTICAL METHODS 


rate themselves on their interest in national political affairs. The ratings ranged 
from 1 to 10 inclusive, the former indicating complete indifference to political 
speeches, rallies, and so forth, whereas the latter indicated an intense interest in 
prevailing national political activities and writings. Other ratings represented 
varying degrees of interest between the two extremes. When classified according 
to whether they voted in the last national election and according to amount of 
formal education, the subjects responded as follows: 

Test the hypotheses: S 

a. There is no difference between the voters and nonvoters, as so desig- 
nated, in their interest in national political affairs as expressed by the 
scale. 

b. There is no difference between men of the various formal education 
levels in their interest in national political affairs as expressed by the 
scale. 

c. There is no interaction between voter status and educational level of men 
with respect to their interest in national’ political affairs as expressed 
by the scale. 

2. A sixth-grade teacher of arithmetic wanted to determine the effectiveness 
of the use of audio-visual aids in teaching addition, subtraction, multiplication, 
and division of common fractions to sixth-grade pupils. At the beginning of the 
semester an achievement pretest was administered to four classes, two of which 
were known to have above-average reading ability and two of which had below- 
average reading ability. One of the above-average and one of the below-average 
reading ability classes were then taught common fraction processes by means of 
extensive use of audio-visual aids, whereas practically no audio-visual aids were 
used with the remaining classes. The pretest was readministered as a final test 
at the end of the semester. The differences between the final test and the pretest 
scores were designated as the criterion and are summarized below. 


—— 
AUDIO-VISUAL AIDS 
YES NO 


READING ABILITY k ZE k ZE 

Above-A verage 18 235 18 174 

Below-Average 18 262 18 130 
DX? = 10,991 


Test the hypotheses: 

a. There is no difference between sixth-grade pupils taught with audio-visual 
aids and those taught without such aids in their tendency to improve 
their scores on the common fractions achievement test. 

b. There is no difference between sixth-grade pupils with above-average 
reading ability and those with below-average reading ability in their 
tendency to improve their scores on the common fraction achievement 
test. 

c. There is no interaction between reading ability status and audio-visual 
aids status with respect to the criterion designated. 

In view of the fact that differences in reading ability may be indicative of 
differences in intelligence, are your conclusions surprising? 

3. At the completion of an introductory college algebra course, the students 
rated the course as to its effectiveness. Although the rating was anonymous, 


ANALYSIS OF VARIANCE—MULTIPLE CLASSIFICATION 209 


the students were asked to indicate their major area of study, to indicate whether 
they were classified as freshmen or sophomores, and whether they considered 
their high school algebra courses to have been superior. The data for 240 of the 
students are summarized below. 


SUPERIOR HIGH SCHOOL ALGEBRA COURSE 


TËS o xo 

MAJOR CLASS LEVEL k ZN k EX 
Engincering Freshman 15 2695 15 2743 
Sophomore 15 2450 15 2366 

Natural Sciences Freshman 15 2871 15 2771 
Sophomore 15 2402 15 2422 

Agriculture Freshman 15 2776 15 2832 
Sophomore 15 2213 15 2198 

Liberal Arts Freshman 15 2998 15 2789 
Sophomore 15 2189 15 2269 


Ignoring the fact that all students not having the same instructor may have 
contributed a bias, make all possible comparisons among the classifications by 
means of one analysis of variance. 

4. In County Q, a sample of 25 teachers in city high school systems were 
compared with 25 teachers from rural high school systems to determine what 
difference, if any, existed in their attitude toward teachers” unions. In an attempt 
to control individual differences, other than location of high school, which might 
affect a high school teacher's attitude toward teachers’ unions, the teachers were 
paired on the basis of age, sex, college attended, college major, scholastic achieve- 
ment in college, length of nonteaching experience, length of teaching experience, 
high school teaching specialty, and salary. The attitude scores were as follows: 


CITY TEACHERS RURAL TEACHERS 


PAIR 
1 72 83 
2 84 81 
3 64 67 
4 76 79 
5 60 51 
6 96 84 
7 64 67 
8 64 62 
9 80 74 

10 72 79 
11 69 84 
12 80 79 
13 91 87 
14 88 89 


15 57 63 


210 STATISTICAL METHODS 


PAIR CITY TEACHERS RURAL TEACHERS 
16 98 92 
17 57 66 
18 72 80 
19 92 94 
20 59 63 
21 80 85 
22 60 52 
23 52 70 
24 91 84 
25 89 93 


a. Can you demonstrate any difference between city high school teachers 
and rural high school teachers in their attitude toward teachers’ unions 
as measured by the instrument used? 

b. Determine whether the pairing has been effective. 


12 


Analysis of Variance—Double 
Classification Correction for 


Disproportionality 


puting the analysis of variance with 
le only when the numbers of cases in 


rie are proportional. When disproportionality exists among the 
biased ses, ordinary methods of computation of the sums of squares yield 
groups es for all sources of variation except that for within sub- 

- Accordingly, the recognition of disproportionate frequencies is 


important. 

Wins Many instances, equal numbers of cases are included in each sub- 

that pue of course, disproportionality is absent. However, it is apparent 

ihe SË Sp need not be equal in order to be proportional. Whenever 

of th io of frequencies in the cells of one category 18 equal to the ratio 
he frequencies in the cells of all similar categories, the frequencies 


octal. 
he two sets of data shown in Table 80 illustrate proportional data 


sane ordinary methods of com 
iple classification are applicab 


Taste 80. Proportional Frequencies in Multicell Tables 
B. Nine-Cell Table 


A. Four-Cell Table 
HIGH SCHOOL CURRICULUM 


T 
oe OCCUPATIONAL STATUS COLLEGE 
i KR Was CLASS PREPAR- GEN- VOCA- 
PLOYED PLOYED TOTAL LEVEL ATORY ERAL TIONAL TOTAL 
Sk 
ch 15 e sor Sophomore 72 54 3 162 
fv sg mn 14 Junior a 45 am 18 
Total Senior 40 30 20 90 
195 156 351 A EE 
Total 172 129 86 387 


212 STATISTICAL METHODS 


which do not contain equal frequencies. In the case of the four-cell table 
the presence of proportional frequencies is confirmed by the fact that 
115:80::92:64 


115 _ 92 
or, in other words, 80 ~ 64 


Crossmultiplying, 
(80)(92) = (115) (64) 
7360 = 7360 


If this equality did not exist, disproportional frequencies would be pres- 
ent. The equality could also be produced by considering the type of 
occupation categories. Thus, 


115 :92::80:64 
115 _ 80 
of; 92 ~ 64 


Crossmultiplying yields the same results as did the first proportion. 
In the case of the nine-cell table the presence of proportional fre- 
quencies is established in the same manner. Therefore, 


72:60:40::54:45:30::36:30:20 
Reducing the third number to unity in each of the three groups, 
1.8:1.5:1::1.8:1.5:1::1.8:1.5:1 


The presence or absence of proportional frequencies in any multiple-cell 
table can be determined by such a procedure as this. 

Since disproportional subclass numbers are frequently encountered in 
research in the social sciences, many investigators use a table of random 
numbers to remove cases in certain cells and hence reduce the number 
of cases until they are proportional, perhaps equal. This procedure, though 
convenient, causes the investigator to lose information. The loss of in- 
formation may be serious, and, in some instances, totally unnecessary. 
To allow the investigator to use every case at his disposal, even though 
disproportional frequencies are present, modified methods of the com- 
putation of the analysis of variance with double classification have been 
devised. 


FOUR-CELL TABLES 


When correcting disproportionality in a double classification with two 
categories within each classification, i.e., a four-cell table, a simple and 
time-saving formula is available. For the purpose of developing the 
formula, a, b, c, and d, are designated as the frequencies in the four cells 
as in Table 81. 


ANALYSIS OF VARIANCE—DISPROPORTIONALITY 213 


TABLE 81. Symbolical Designation 
of Cell Frequencies in a 
Four-Cell Table 


STUB ITEMS HEADINGS TOTAL 
a % ka 
c d ka 
Total ky ks N 


The mean score of ki cases is represented by E. the mean score of ka 
cases by Xs, the mean score of ks cases by Xs, and the mean score of ks 
cases by Xy. Furthermore, the difference between X, and X: equals Dj», 
whereas the difference between Xs and X, equals Da 

The adjustment term for disproportion is equal to 


Gd HE (a) Daa)? + (ed Da)" — 212) Ds, ad — be) 
N E (ad — | 


(kakakaks) 
This adjustment term, if positive, is to be subtracted from the sum of 
squares for interaction and added separately to the sums of squares for 
each of the two main effects, these sums of squares having been computed 
in the conventional manner. If negative, the adjustment term is added 
to the sum of squares for interaction, and subtracted separately from the 
sums of squares for each of the two main effects. 


TABLE 82. Sums of Attitude Scores of College Seniors 


SEX 
MEMBER OF MALE FEMALE TOTAL 
SOCIAL FRATERNITY k EX k EX k EX 
Yes 88 7,332 30 2,376 118 9,708 
No 156 12,036 20 1,384 176 13,420 
Total 244 19,368 50 3,760 294 23,128 


YX? = 1,901,280 


An application of the foregoing formula is illustrated in a study in- 
volving college seniors of both sexes, some of whom were members of 
social fraternities and some of whom were not. An investigator who was 
interested in the attitude of college seniors toward divorce administered 
a rating scale to a sample of 294 students and summarized his data as 
shown in Table 82. The higher the attitude test score, the more favorable 


was the attitude toward divorce. 


214 STATISTICAL METHODS 


Inspection of the frequencies in Table 82 reveals that disproportionality 
exists. Obviously, the ratio of 88 to 156 is not equal to the ratio of 30 to 
20. To compute an analysis of variance with multiple classification with 
the entire sample, correction for disproportion must be made. To present 
the data in Table 82 in the most useful form for correction of the dis- 


TABLE 83. Means of Attitude Scores of College Seniors 


SEX 
MEMBER OF MALE = PA TOAN. 
SOCIAL FRATERNITY k X k x k x 
Yes 88 83.318182 30 79.200000 118 82.271186 
No 156 77.153846 20 69.200000 176 76.250000 
Total 244 79.377049 50 75.200000 294 78.666667 


proportion, the mean scores for each subclass were computed and are 
shown in Table 83. If the two categories in each classification respond to 
the criterion differently, the difference between the two members of each 
pair of border means in Table 83 would not be the same as though pro- 
portionality existed. 
Comparison of Table 83 and Table 81 shows that: 
kı = 118 ka = 176 ka = 244 ka = 50 N = 294 
Xi = 82.271186 X: = 76.250000 Xy = 79.377049 X, = 75.200000 
Di, = 82.271186 — 76.250000 = 6.021186 
Dh = 79.377049 — 75.200000 = 4.177049 
When substituting these values into the adjustment formula, it is con- 
venient to solve first for the fraction 
(ad — be)? 
(akokaks) 
since it appears twice in the formula. 
(ad — bo)? _ [(88)(20) — (156)(80)}* _ 4 6 
DEEN T (118) (1761244) (50)  T °-038652025 
Substituting in the complete formula, 
(0.03365) [(118) (176) (6.021)?-+ (244) (50) (4.177)”) —2(6.021) (4.177) [(88) (20) — (30) (156)] 
294(1 —0.03365) 
=631.457 


The adjustment term of 631.457 is subtracted from the sum of squares for 
interaction and added separately to the sums of squares of the two main 
effects. The sums of squares computed in the conventional manner are 


ANALYSIS OF VARIANCE—DISPROPORTIONALITY — 215 


now called the unadjusted sums of squares and are calculated from the 
values in Table 82 as follows: 


For total: 


S.S. = 1,901,280 — ae ” = 81,877.334 


For sex: 


(19,368)? , (8,760)? 2 
s.s. = COSBY , BT DESCH = 124025 


For fraternity status: 


_ (9,708)? (13,420)? _ (23,128)? _ 
SS. = 118 + 176 oof > 2,561.014 
For interaction: 


_ (7,332)? , (12,086) , (23707 | (1,384)? _ (23,128) 
88 156 30 20 = 294 


SS. 
— (724.025 + 2,561.014) = 777.031 


For vvithin: 
S.S. = 81,877.334 — (724.025 + 2,561.014 + 777.031) = 77,815.264 


These sums are entered in the analysis of variance shown in Table 84. 
Tanta 84. Analysis of Variance of Attitude Scores with Means 
Adjusted for Disproportionality 
n o u ea U 


SUM OF SQUARES 


SOURCE OF DEGREES OF 
VARIATION FREEDOM UNADJUSTED ADJUSTED MEAN SQUARE 
Sex 1 724.025 1,355.482 1,355.482 
Fraternity Status a 2,561.014 3,192.471 3,192.471 
Interaction 1 777.031 145.574 145.574 
Within 290 77,815.264 268.328 
Total 293 81,877.334 
1,355.482 
: = 2252 = 5.05 
For Sex: Fino 268.328 5. 


3,192.471 
For Fraternity Status: F1,20 = “268.328 = 11.90 


a Fics 145.574 _ 
For Interaction: 1.290 268.328 


'The analysis of variance is completed by computing the mean squares 
from the adjusted sums of squares and determining F-values. The F- 
values for both main effects are significant whereas that for the interac- 


tion is nonsignificant. 


216 STATISTICAL METHODS 


MULTIPLE-CELL TABLES 


Although the foregoing method can be applied only to a four-cell table, 
several techniques have been devised which are capable of correcting 
disproportionality in any double classification analysis of variance. Per- 
haps the simplest and most expeditious of these methods is that reported 
by Patterson.’ According to this method the border means of the double 
classification are successively adjusted until they equal the general mean. 
The adjustment terms are then used to compute the adjusted sum of 
squares for one or both main effects. 

Patterson’s method can, of course, be applied to a four-cell table such 
as Table 83 and would have yielded the same adjusted sums of squares, 
except for rounding errors, as in Table 84. The use of this method with a 


TABLE 85. Unadjusted Means of English Growth Scores 
of Male College Freshmen 


PERMANENT RESIDENCE 
URBAN RURAL NONFARM RURAL FARM TOTAL 


CURRICULUM =k x k x k b4 k x 


Engineering 13 83.153846 10 85.700000 14 85.142857 87 84.594594 
Science 7 82.428571 14 82.642857 6 83.666667 27 82.814815 
Agriculture 12 83.666667 7 87.000000 30 84.466667 49 84.632653 


Total 32 83.187500 31 84.612903 50 84.560000 113 84.185841 
Adjustment for curriculum means: Engineering = —0.408753 
Science = 1.371026 
Agriculture = —0.446812 


multiple-cell table other than a four-cell table is illustrated by the fol- 
lowing example. 

A research worker was interested in the growth in ability to use 
English grammar on the part of male college freshmen. A sample of 113 
freshmen was administered a pretest immediately prior to the beginning 
of a one-semester English course, and a final test at the completion of 
the semester. The difference between the pretest and the final test scores 
was considered to be a measure of growth in ability to apply English 
grammar. The data are summarized in Table 85. 

As disproportional frequencies exist in the subclasses, an adjustment 
must be made in order to compute a double classification analysis of 
variance involving all 113 cases. When applying Patterson’s method, 
Table 85 is the beginning point for a series of adjustments of the means 


1R. E. Patterson, “The Use of Adjusting Factors in the Analysis of Data with 
Disproportionate Subclass Numbers,” Journal of the American Statistical Associa- 
tion, 41:334-346, Sept., 1946. 


ANALYSIS OF VARIANCE—DISPROPORTIONALITY 217 


which are to be made. The adjustments could begin with either the cur- 
riculum or the residence border means. In this instance the adjustment 
for the curriculum means was computed first. 

If the general mean? of 84.185841 is taken as a reference point, it is 
apparent that the mean of the engineering students differs from the gen- 
eral mean by the quantity 84.185841 — 84.594594, or —0.408753. In the 
same manner the mean of the science students differs from the general 
mean by 84.185841 — 82.814815, or 1.371026. The mean of the agriculture 
students differs from the general mean by 84.185841 — 84.632653, or 
—0.446812. 

The first adjustment for curriculum is made by subtracting 0.408753 


- from the means of engineering students living in each type of residence, 


by adding 1.371026 to the means of science students living in each type 
of residence, and by subtracting 0.446812 from the means of agriculture 
students living in each type of residence. 


For engineering students: 


Urban: 83.153846 — 0.408753 = 82.745093 

Rural Nonfarm: 85.700000 — 0.408753 = 85.291247 

Rural Farm: 85.142857 — 0.408753 = 84.734104 
For science students: 

Urban: 82.428571 + 1.371026 = 83.799597 

Rural Nonfarm: 82.642857 + 1.371026 = 84.013883 

Rural Farm: 83.666667 + 1.371026 = 85.037693 
For agriculture students: 

Urban: 83.666667 — 0.446812 = 83.219855 

Rural Nonfarm: 87.000000 — 0.446812 = 86.553188 

Rural Farm: 84.466667 — 0.446812 = 84.019855 


Because of these adjustments, the border means for all residence cate- 
gories must be corrected. The new border means are computed as follows: 


1 Here, as in many instances throughout this book, the calculations have been 
carried far beyond the number of decimal places for which confidence in accuracy 
can be assured for any given mean. This procedure has been followed for the pur- 
pose of overcoming the magnification of rounding errors, involved in subsequent 
mathematical manipulation, particularly when multiplication is involved. It is 
recommended, of course, that final reported values, after all mathematical treatment 
has been completed, should not be out-of-line with those suggested by the 
accuracy of the original data. Failure to carry decimal places beyond the number 
usually considered as significant figures will render the method proposed by Patter- 
son extremely difficult to defend from the standpoint of ultimate accuracy after the 
completion of mathematical treatment. 


218 STATISTICAL METHODS 


For urban students: 


x= (13) (82.745093) + (D (83.799597) + (12)(83.219855) _ 83.153802 


For rural nonfarm students: 


TE (10) (85.291247) + IIe oe (7. (86.553188) = 84.999327 

For rural farm students: 

See j 85. 93 i 

X= (14) (84.734104) + (6)( Ne ) + (30) (84.019855) = 84.341985 
The adjusted means are shown in Table 86. 

TABLE 86. First Adjustment of Curriculum Means 
PERMANENT RESIDENCE 
URBAN RURAL NONFARM RURAL FARM TOTAL 

CURRICULUM k x k x k = k x 


Engineering 13 82.745093 10 85.291247 14 84.734104 37 84.185841 
Science 7 83.799597 14 84.013883 6 85.037693 27 84.185841 
Agriculture 12 83.219855 7 86.553188 30 84.019855 49 84.185841 


Total 32 83.153802 31 84.999327 50 84.341985 113 84.185841 
Adjustment for residence means: Urban = 1.032039 
Rural Nonfarm = —0.813486 
Rural Farm = —0.156144 


Adjustment for residence means is now in order, The differences be- 
tween the general mean and the urban, rural nonfarm, and rural farm 
border means are found in the usual manner. 


For urban students: 84.185841 — 83.153802 = 1.032039 
For rural nonfarm students: 84.185841 — 84.999327 = — 0.813486 
For rural farm students: 84.185841 — 84.341985 = —0.156144 


These differences are applied with the appropriate sign to the subclass 
means as follows: 


For urban students: 
Engineering: 82.745093 + 1.032039 = 83.777132 
Science: 83.799597 + 1.032039 = 84.831636 
Agriculture: 83.219855 + 1.032039 = 84.251894 


ANALYSIS OF VARIANCE—DISPROPORTIONALITY 219 


For rural nonfarm students: 
Engineering: 85.291247 — 0.813486 = 84.477761 


Science: 84.013883 — 0.813486 = 83.200397 
Agriculture: 86.553188 — 0.818486 = 85.739702 


For rural farm students: 
Engineering: 84.734104 — 0.156144 = 84.577960 
Science: 85.037693 — 0.156144 = 84.881549 
Agriculture: 84.019855 — 0.156144 = 83.863711 


The border means of the various curricula are also recomputed. 


For engineering students: 


y — (13)(83.777132) + (10 Starrin + 14)(84.577960) _ 94 969507 


For science students: 


y — (D(8£:831630) + (14 Se + (6)(84.881549) _ 23 996900 


For agriculture students: 
E = (12)(84.251894) + (1 (85.730702) + (30) (83.863711) _ 84.226775 


The results for the first adjustment of residence means are shown in 
Table 87. 


TapLE 87. First Adjustment of Residence Means 
a q 


PERMANENT RESIDENCE 


URBAN RURAL NONFARM è RURAL FARM TOTAL 
CURRICULUM = k x k x k = k x 
Engineering 13 83.777132 10 84.477761 14 84.577960 37 84.269507 
Science 7 84.831636 14 83.200397 6 84.881549 27 83.996900 


Agriculture 12 84.251894 7 85.739702 30 83.863711 49 84.226775 


Total 32 84.185841 31 84.185841 50 84.185841 113 84.185841 


Adjustment for curriculum means: Engineering = —0.083666 
Science = 0.188941 
Agriculture = —0.040934 


220 


STATISTICAL METHODS 


TABLE 88. Second Adjustment of Curriculum Means 


PERMANENT RESIDENCE 


URBAN RURAL NONFARM RURAL FARM TOTAL 
CURRICULUM k X k bd k x k x 
Engineering 13 83.693466 10 84.394095 14 84.494294 37 84.185841 
Science 7 85.020577 14 83.389338 6 85.070490 27 84.185841 
Agriculture 12 84.210960 7 85.698768 30 83.822777 49 84.185841 
Total 32 84.177832 31 84.234937 50 84.160527 113 84.185841 
Adjustment for residence means: Urban = 0.008009 
Rural Nonfarm = —0.049096 
Rural Farm = 0.025314 


TABLE 89. Second Adjustment of Residence Means 
SE 


PERMANENT RESIDENCE 


URBAN 


RURAL NONFARM 


RURAL FARM TOTAL 
CURRICULUM k X k x k x k x 
Engineering 13 83.701475 10 84.344999 14 84.519608 37 84.184964 
Science 7 85.028586 14 83.340242 6 85.095804 27 84.168085 
Agriculture 12 84.218969 7 85.649672 30 83.848091 49 84.196287 
Total 32 84.185841 31 84.185841 50 84.185841 113 84.185841 
Adjustment for curriculum means: Engineering = 0.000877 
Science = 0.017756 
Agriculture = —0.010446 


TABLE 90. Third Adjustment of Curriculum Means 
SE 


PERMANENT RESIDENCE 


URBAN RURAL NONFARM RURAL FARM TOTAL 
CURRICULUM KË x k x k X k ¥ 
Engineering 13 83.702352 10 84.345876 14 84.520485 37 84.185841 
Science 7 85.046342 14 83.357998 6 85.113560 27 84.185841 
Agriculture 12 84.208523 7 85.639226 30 83.837645 49 84.185841 
Total 32 84.186164 31 84.191784 50 84.181950 113 84.185841 
Adjustment for residence means: Urban = —0.000323 


Rural Nonfarm = —0.005943 


Rural Farm 


Il 


0.003891 


ANALYSIS OF VARIANCE—DISPROPORTIONALITY 221 


TABLE 91. Third Adjustment of Residence Means 


PERMANENT RESIDENCE 
URBAN RURAL NONFARM RURAL FARM TOTAL 


CURRICULUM k x E ba k X k x 


Engineering 13 83.702029 10 84.339933 14 84.524376 37 84.185594 
Science 7 85.046019 14 83.352055 6 85.117451 27 84.183541 
Agriculture 12 84.208200 7 85.633283 30 83.841536 49 84.187295 


“Total 32 84.185841 31 84.185841 50 84.185841 113 84.185841 


Adjustment for curriculum means: Engineering = 0.000247 
Science = 0.002300 
Agriculture = —0.001454 


The pattern of the calculations is now evident. Successive adjustments 
of the type described can be made for the curriculum means and the 
residence means until the value of all border means is identical with the 
general mean, 84.185841. At this point complete adjustment for dispro- 
portional frequencies has been made. The number of corrections necessary 
to attain this condition is dependent upon the amount of dispropor- 
tionality present. 

In actual practice it is not necessary to repeat the adjustment until 
the value of all border means is identical with the general mean. Since 
the adjustment terms are used to compute the adjusted sums of squares 
for the main effects, the adjustment sequence is stopped when the adjust- 
ment terms become so small that they no longer influence the sum of 
squares to the number of decimal places desired. Therefore, the adjust- 
ments in the problem at hand are terminated with the third adjustment 
of residence means as shown in Table 91. 

The computation of the adjusted sums of squares for residence can be 
made since the first correction was made for the curriculum means. All | 
adjustment factors are used except those employed for the first adjust- 
ment for curriculum shown below Table 85. Thus, the adjusted sums of 


squares for residence is: 


Table 86: (32) (1.032039)? + (31)(—0.813486)? + (50)(—0.156144)? = 55.8169 
Table 87: (37)(—0.083666)? + (27)(0.188941)* + (49)(—0.040934)2 = 1.3050 
Table 88: (32) (0.008009)? + (31)(—0.049096) + (50)(0.025314)? = 0.1088 
Table 89: (37)(0.000877)2 + (27)(0.017756)? + (49)(—0.010446)? = 0.0139 
Table 90: (32) (—0.000323)? + (31)(—0.005943)* + (50)(0.003891)? = 0.0019 
Table 91: (37)(0.000247)? + (27) (0.002300)? + (49)(—0.001454)? = 0.0002 

57.2467 


Obviously further adjustment beyond Table 91 is unnecessary if the 
number of decimal places is restricted to four. 
The adjusted sums of squares for curriculum can be found by repeating 


222 STATISTICAL METHODS 


the entire adjustment process, adjusting for the residence means first 
rather than for the curriculum means as in Table 85. The adjusted sums 
of squares is then computed in the same manner as that for residence. 
However, this is extremely laborious. A shorter method is available when 
the unadjusted sums of squares are used. When the unadjusted sums of 
squares for residence is subtracted from the adjusted sum of squares, 
an adjustment for disproportionality term of the same type as that men- 
tioned in the beginning of the chapter is found. Thus, 


57.2467 — 44.5408 = 12.7059 


TABLE 92. Analysis of Variance of English Growth Scores with Means 
Adjusted for Disproportionality 


SOURCE OF DEGREES OF SUM OF SQUARES MEAN 
VARIATION FREEDOM UNADJUSTED ADJUSTED SQUARE 
Residence 2 44.5408 57.2467 28.6234 
Curriculum 2 66.7076 79.4135 39.7068 
Interaction 4 55.9420 43.2361 10.8090 
Within 104 3,749.9069 36.0568 
Total 112 3,917.0973 


e 28.6234 
oF d > Fun = —— —Q. 
or Residence 2,104 36.0568 0.79 


e 39.7068 
F : Fau = — = ], 
or Curriculum: P. Ae 36.0568 1 


5 10.8090 
For Interaction: Faiou = 36.0568 = 0.30 

and unadjusted sums of squares for residence were negative, the adjust- 
ment term would be subtracted from the unadjusted sum of squares for 
curriculum and added to that for interaction to obtain adjusted values. 

-values are computed as usual and the customary interpretations made. 

It should be noted that Patterson’s method of adjusting means is ca- 
pable of adjusting the disproportionality existing in any double classifica- 
tion irrespective of the number of categories within the two classifications. 
The method is not applicable to data classified in three or more ways. 


Exercises 


1. A state supervisor compiled the vocabulary test scores of 46 gifted seventh- 
grade pupils and wished to determine whether the location of the elementary 


ANALYSIS OF VARIANCE—DISPROPORTIONALITY 223 


school they had attended influenced the test scores. A gifted child was defined 
as one whose intelligence quotient surpassed 135. The test scores of the sample, 
when classified according to sex and location of elementary school, are sum- 
marized in the following table. 


LOCATION OF MALE FEMALE TOTAL 
ELEMENTARY SCHOOL k ZE k ZX N EX 
Urban 1 332 15 376 26 708 
Rural 13 295 7 167 20 462 
Total 24 627 22 543 46 1170 

DX? = 33,892 


By means of a multiple classification analysis of variance, test 
a. whether the vocabulary mean score for pupils who attended urban ele- 
mentary schools differed significantly from the vocabulary mean score 


for pupils who attended rural elementary schools, A. 
b. whether the vocabulary mean score for boys differed significantly from 


the vocabulary mean score for girls, ; i 
c. whether there is an interaction between the sex classification and the 
location of the elementary school. 


Correct for disproportionality present in the data. ` g 
2. In order to facilitate the teaching of advanced subjects, a certain college re- 
n pre-calculus mathematics be taken 


quired that a one-semester review course i alcult t I 
by all entering freshmen who were found to be deficient in mathematics. Since the 
course was highly accelerated, additional help classes were regularly scheduled 
and the students were urged to attend. The achievement of each student in the 
review course was recorded as either A = 4; B = 3; C = 2 D=1; rF=0. 
Se a 
_____ATTENDED HELP CLASS — 


ATTENDED HELP CLASSES 


YES NO TOTAL 
RETURNED FOR E -< JX Ne o 
SOPHOMORE YEAR k EX k 
SE kë EE 
101 160 264 745 
ue 193 Pë 81 112 157 317 
76 205 


A me 
Total 239 790 182 272 421 1,062 
, e E eS 


i ized in the foregoing table, deter- 

. f the final marks summarized in y 5 

j sl students who eventually returned for their sophomore 

ear differed in review course achievement from students who did not 

Pee for their sophomore year; whether students who attended addi- 
tional help classes differed in review course achievement from those who 
did nots and whether interaction exists between the two classifications. 

ee i cisti i tionalit, 

ke E tion for the existing disproportic Yo : 

b in the Ze analysis individual differences in scholastic aptitude have 


224 STATISTICAL METHODS 


been ignored. Considering the nature of the problem do you consider this 
omission to be advisable? 

8. Each of 93 male liberal arts seniors, all of whom were known to be in the 
upper half of their class, selected one method of the following three which he 
preferred to follow in memorizing a list of nonsense syllables. The whole learn- 
ing method involved reading the list from beginning to end, then going back to 
the beginning and again reading the entire list, and repeating until the syllables 
were learned. The part method consisted of learning the first syllable, then the 
first and second, then the first, second, and third, and continuing in that manner. 
The combination method divided the list into fifths so that each fifth could be 
learned separately by the whole method. 

Each subject also decided whether he wanted to memorize silently or aloud 
and, under carefully supervised conditions, memorized the list by the method 
chosen. The subjects returned their lists and were instructed to make no effort 
to recite the list until asked to do so. In the following table are summarized 
the number of syllables correctly recalled at a later date. 


—9p9p9p 


SILENT ALOUD TOTAL 


METHOD k ZE k ZE N EX 

Whole 11 309 if 202 18 511 

Combination 22 540 23 554 45 1,094 

Part 21 483 9 214 30 697 

Total 54 1,332 39 970 93 2,302 
EX? = 59,268 


Using all 93 cases, test the null hypotheses that 
a. in terms of the number of syllables retained the methods do not differ 
significantly. 
b. in terms of the number of syllables retained memorizing silently and 
aloud do not differ significantly. 
c. in terms of the number of syllables retained there is no interaction 
between the two classifications. 
Correct for the disproportionality present. 
4. A cynicism test was administered to men and women college students of 
various class levels. The results are listed in the following table. 


MEN WOMEN TOTAL 
CLASS LEVEL k =X k EX N EX 
Freshman 32 7,183 91 18,505 123 25,688 
Sophomore 24 5,398 124 22,978 148 28,376 
Junior 24 5,651 33 6,913 57 12,564 
Senior 46 10,010 13 2,686 59 12,696 
Total 126 28,242 261 51,082 387 79,324 


ZX? = 17,379,133 


ANALYSIS OF VARIANCE—DISPROPORTIONALITY 225 


By means of a double classification analysis of variance, test for significance 
a. the difference between the means for the two sexes 
b. the difference among the means for class levels 

Correct for the disproportionality present. 


13 


Linear Regression 
ee 


Research workers in the social Sciences frequently meet problems in 
which it is desirable to predict one characteristic of individuals from one 
or more other characteristics. For example, in a study of class achieve- 
ment in a high school general science course, the investigator may wish 


may be predicted from known values in the other distribution within the 
limits of the available data. In the foregoing example, if the achievement 


If the data representing the criterion and the prediction variable were 
plotted, the line assumed to be satisfactory for predicting the criterion 
from the prediction variable is called a regression line. The mathematic 
equation for representing this regression line, which may or may not be 
linear, is called the regression equation. Owing to insufficient data, if for 
no other reason, the plotted data in actual situations seldom more than 
approximate a straight line or any other smooth curve. The question of 
whether the relationship is linear or curvilinear will be considered in 
the following chapter. 


SINGLE PREDICTION VARIABLE 


In many situations in psychology and education the assumption of a 
linear relationship between the criterion and prediction variable does not 
contradict existing knowledge of the field. Perhaps such a relationship 
should be called the straight line relationship. In these cases, it is as- 
sumed that unit changes in the prediction variable are accompanied by 

226 


LINEAR REGRESSION 227 


proportional changes in the criterion. An equation representing such a 
relationship is: 
Y=ax+C 
where Y = criterion 
X = prediction variable 
aand C = appropriate constants 

The values of a and C are so chosen that the best possible prediction equa- 
tion is usually defined as the one by which the sum of the squares of the 
errors between actual and predicted Y-values is a minimum. These errors 


of prediction are called residuals. 
In the equation Y = aX + C values of a and C are desired such that if 


a Y-value is predicted for each X-value, 
ZY — aX — C}? = a minimum 
By the calculus, these values may be had by differentiating with respect 
to a and C respectively, and setting the first derivative equal to zero; thus 
Z(Y — aX — C)(—2X) =0 
Z(Y — aX — C)(—2) = 0 
Making the indicated multiplications and dividing by (—2) in each of 
these equations, the normal equations are produced. 
ZXY —'azxX: + CEX 
ZY —azX + NC 
These normal equations may be utilized in computing the equation of 
the best fitting straight line that may be passed among the data. 
An example shown in Table 98 should clarify the procedure for obtaining 
a regression equation. In this case weights of men are to be predicted from 
their heights and Y (weight) is the criterion, whereas X (height) is the 
prediction variable. When the needed values from Table 93 are substituted 
in the normal equations, they become 
(1) 110,600 = 47,640a + 690C 
(2) 1,600= 6902+ 100 
Repeating the equation (1) and multiplying equation (2) by 69, and sub- 
tracting, C is eliminated. 
110,600 = 47,6402 + 690C' 
110,400 = 47,610a + 690C 
200= 30a 


Solving, a= 6% 
Substituting 624 for a in (2) 
1,600 = 690(624) + 10C 


Solving, C = —300 


228 STATISTICAL METHODS 


The regression equation, then, for predicting weights from heights as judged 
from the available data is 


Y = 6235X — 300 


Thus, to predict weight for a man, his height is multiplied by 624 and the 
product diminished by 300 pounds. 


TABLE 93. Heights and Weights of Ten Men 


WEIGHT HEIGHT 


(LES.) (INCHES) 

INDIVIDUAL E X xY Y: X: 
1 160 70 11,200 25,600 4,900 
2 200 72 14,400 40,000 5,184 
3 140 68 9,520 19,600 4,624 
4 150 66 9, 22,500 4,356 
5 130 67 8,710 16,900 4,489 
6 160 69 11,040 25,600 4,761 
7 150 68 10,200 22,500 4,624 
8 150 71 10,650 22,500 5,041 
9 140 70 9,800 19,600 4,900 
0 


263,200 47,640 


vd 
o 
g 
Bl 
‘3 
3/8 
SIS 
o 
for} 
SIS 
A 
Eia 
sja 
sje 
© 
A 
Ka 
5 
E 
= 
Q 
2 


The height and weight values listed in Table 93 have been plotted in 
Figure 23. Lines representing the regression equation, the mean weight 
(F), and the mean height, (X), are also shown. It should be noted that 
the three lines intersect at one point. The slope of the regression line is 
6%, the value of a, and the Y-intercept is —300, the value of C. 


Ka 
a 
i=) 


Weight (Y) 


120 


61 Height (X) 69 


Fic. 23. Graphic Presentation of Regression, 


LINEAR REGRESSION 229 


Inspection of the figure reveals that only one of the ten points falls 
upon the regression line, whereas several points deviate considerably 
from it, This result may be surprising since the regression line was com- 
puted in such a manner as to obtain the best possible prediction equation. 
Oftentimes, however, none of the points falls on the regression line and 
some of the points lie far from its path. 

It can also be noted that only two points fall on the line representing 
the mean weight, and again, many points are found at some distance 
from this line. These deviations away from the regression line and away 
from the mean of the criterion are important in the determination of the 
effectiveness of a regression equation. 


Weight (Y) 
3 


Fia. 24. Graphic Presentation of Deviations from Y. 


iati of the ten points away from the mean vveight line are 
SC të noes The E, 2 range from 60 pounds for Individual 
10, who weighed 220 pounds, to zero pounds for Individual 6, who 
weighed 160 pounds, the mean weight of the ten men. If the distances 
from each point to the mean line were squared individually and thes 
summed, the resulting value would be the sum of the SE of Ki 
deviations away from the mean. This procedure can be recognized as the 
same type of calculation encountered in the analysis of variance. E 
To find the sum of squares of the deviations away from the mean by 
measuring the length of the vertical lines, squaring them, and ae 
them, is a laborious and inaccurate procedure. Rather, the equation 


GI 
By? = ZY? — "Ae 


is used. In the case of the weights of the ten men, the data in Table 93 


are substituted: 


230 STATISTICAL METHODS 
Zy” — 263,200 — asur = 7,200 


This value is known as the sum of squares for total and is representative 
of the scattering of the points around the mean weight line. 

The deviations of the points away from the regression line can also be 
examined more closely by drawing the appropriate vertical lines as in 
Figure 25. The intersection of the vertical line from any point and the 
regression line gives the value of the predicted weight for that individual. 
For example, Individual 1, whose height is 70 inches, has a predicted 
weight of 166% pounds. Since his actual weight is 160 pounds, the error 
of prediction is therefore 624 pounds which is designated as a negative 
error. Individual 9 also is 70 inches in height and his predicted weight is 


10 


120 
61 Height (X) 69 


Fra, 25. Graphic Presentation of Deviations from the Regression Line. 


likewise 166% pounds. His actual weight, however, is only 140 and the 
resulting error of prediction is —26% pounds. 

In the same manner careful measurement of the deviations in Figure 
25 would establish each error of prediction. However, this procedure is 
also inaccurate, and as already mentioned, the equation of the regression 
line affords a rapid method of prediction. When the equation is solved 
for each of the ten men, predicted weights are found as shown in Table 94. 

In each case the predicted weight has been subtracted from the actual 
weight to obtain the error of prediction or residual. The sum of the 
residuals is necessarily zero. In the last column, each of the residuals has 
been squared and their sum is 5,86624. Because the regression equation 
was so computed that the sum of the squares of the errors of prediction 
would be a minimum, any values of a and C other than 6% and —300, 
will yield a greater sum. This sum of squares for residuals, 5,86624, can 


LINEAR REGRESSION 231 


also be found without the labor of individual predictions from the equa- 
tion which follows: 
Sum'of squares for residuals = SY? — azXY — CZY 
= 263,200 — (624) (110,600) — (— 300) (1,600) 
= 5,86624 
Actually, the latter method is nearly always used since it is much less 


time-consuming. Only when prediction is wanted for each individual, 
and that is seldom, will the sum of squares for residuals be obtained as 


in Table 94. 


TABLE 94. Residual Weights of Ten Men 
a eee 


WEIGHT RESIDUAL 
INDIVIDUAL ACTUAL PREDICTED RESIDUAL SQUARED 
1 160 166 2/3 —6 2/3 44 4/9 
2 200 180 20 400 
3 140 153 1/3 —13'1/3 177 7/9 
4 150 140 10 100 
5 130 146 2/3 —16 2/3 277 7/9 
6 160 160 0 0 
7 150 153 1/3 -31/3 11 1/9 
8 150 173 1/3 —23 1/3 544 4/9 
9 140 166 2/3 —26 2/3 711 1/9 
60 3600 


220 160 


Di 
o 
o 


Total 1600 1600 5866 2/3 


Usually a and Cin a regression equation are expressed as decimal frac- 
tions. The number of decimal places to be carried in the calculations is 


generally determined by the accuracy required in the solution of the 


simultaneous equations. A good practice, in most situations, is to carry 
ht-bank calculating machine will yield. 


as many decimal places as an eig 
The decision as to the number of decimal places to carry may also be 
based upon the purpose of the prediction. TË the purpose of the regression 
is to predict for individuals only, it is clear that 6.7 can be substituted for 
6% with little error. If group prediction is desired, however, more decimal 
Places are required. AS ; 
For meter es ch sum of squares for residuals for all individuals is 


Computed from 
yy? — aBXY — C2Y 

6% will yield a large discrepancy since the 
able numbers. Although in the prob- 
comes of greater 


the substitution of 6.7 for 
values a and C are multiplied by siz 
lem stated the discrepancy may BO 
Concern as the number of cases increases. 


t be serious, it be 


232 STATISTICAL METHODS 


In addition to the sum of squares for total and the sum of squares for 
residuals, a third value known as the sum of squares for regression can 
be computed. This sum of squares is the reduction occasioned when the 
sum of squares away from the regression line is subtracted from the sum 
of squares away from the mean line. This difference may be thought of 
as the advantage gained by computing the sum of Squares away from 
the regression line rather than away from the mean. A simple equation 
for computing the sum of squares for regression is 


z 2 
Sum of squares for regression = a£ XY + CZY > ere 


When the appropriate data for the heights and weights of the ten men 
are substituted - 


D 2 
Sum of squares for regression = (624) (110600) + (—800) (1600) — Um 


š = 133314 
'The three foregoing sums of Squares are used in the statistical technique 
called the analysis of regression, which ean be used to test the significance 
of the regression. d e 
The analysis of regression becomes important vyhenever a population 
can be postulated from which the sample at hand might constitute a 
random sample, Usually when a Tegression equation is developed, it is 


TABLE 95. Raw Score Entries in Analysis of Regression Table 


SOURCE OF DEGREES OF 
VARIATION FREEDOM SUM OF SQUARES MEAN SQUARE 
Regression 1 SEIT + Czy — en | abX¥ 4Czy — go 
Residuals N-2 ZY” —azXY — Cay CH 
(ey) NzY? — (3py 
NI zy AZI) == EY, 
Total d N MO — 1) 


for the purpose of prediction for individuals not in the sample, since 
prediction is of little importance when actual values are known. 


/ For testing the significance of regression, the usual analysis of vari- ` 


ance table should be prepared äs in Table 95. If the data shown in Table 
93 are tested to determine whether there is a significant relationship 
between heights and weights of men, the analysis would be as 'shown in 
Table 96. ; 

As usual, the number of degrees of freedom for total is one less than 


LINEAR REGRESSION, 


TABLE 96. Analysis of Regression of Heights on Weights of Men 


Si 233 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Regression d 1 1333 1/3 1333 1/3 
Residuals 8 5866 2/3 733 1/3 
Total 9 . 7200 800 


1333 1/3 E 
SAO 21.82) | t= V1:82)= 1.3 
Pus = “7331/3 y 


the number of cases in each distribution, which in this case is nine. The 
number of degrees of freedom for regression is the number of prediction 
variables in the regression equation which in this case is one, since height 
is the only variable used for prediction. 

The sum of squares for residuals is usually obtained by subtracting 
the sum of squares for regression from the total sum of squares. How- 


ever, it may be obtained, as earlier indicated, from 
S.S. Residuals = ZY” — azXY — CZY 


'The number of degrees of freedom is obtained by subtraction and in this 


case is eight. > 7 } , 
To test for significance of the regression, F is found in the usual man- 


ner by dividing the mean square for regression by the mean square for 
residuals, An F-value of 1.82 is found which is not significant. Thus, with 
the sample at hand, insufficient evidence is found to refute the null 
hypothesis: weights cannot be predicted from heights. «4 

Whenever the number of degrees of freedom for the F numerator mean 
square is one, £ = YF' with the number of degrees of freedom associated 
with the denominator. Therefore, t = 1.35. When a table of ¢ is consulted, 
it can be seen that a relationship as great or greater than the one here 
found with eight degrees of freedom would be expected in approximately 
one sample in every five drawn from a population in which heights and 


weights were uncorrelated. | 
VThe variance of weights’ 


standard deviation is its square root or 28.3 pounds. The standard error 


of estimate is the square root of the mean square for residuals—in this 
case 27.1 pounds. Thus, it is to be expected from evidence here given that 
prediction of weights from heights for men will not be wrong by more 
than 27.1 pounds in approximately two-thirds of the predictions one 
might make. The reduction of the standard deviation from 28.3 pounds 
to the standard error of estimate of 27.1, a difference of only 1.2 pounds, 
is further evidence of the need of a larger sample if the null hypothesis 


is to be refuted. 


“is the mean square for total, or 800. The 


234 STATISTICAL METHODS 


i i i lso from Table 96. It is 
fficient of correlation may be obtained a t 
E Eve ËS of the quotient of the sum of squares for regression 
divided by the total sum of squares. Thus, 


_ _ [138834 _ 
Ke No. 45 


Incidentally, the testing of the significance of the regression also tests the 
significance of the coefficient of correlation. Although a sizable coefficient 
of correlation is found between weights and heights of men, the prob- 
ability of obtaining an r of 0.43 or greater (with the resulting t = 1.35) 
from uncorrelated weights and heights is one out of five i 
this size. Laat 

The foregoing example illustrates the desirability of large samples. It 
is common knowledge that weights and heights are positively correlated, 
If, however, it were necessary to draw such an inference from a sample 
such as here reported, it would be necessary to report that evidence is 


insufficient at the 5 per cent level to prove any relationship between 
heights and weights. 


n a sample of 


mean, and is designated by a lower- 
scores for weights and heights could b 


y=Y-Y 
=X-—X 
The use of deviation scores rather 


than raw scores when computing the 
regression equation simplifies the pri 


ocess, since the general form of the 
equation changes from Y = aX + 


C to y = az. A value of a is desired 
such that if a y-value is predicted for each x-value, 


ZYy-a-a minimum 
'This a-value may be obtained by 


differentiatin; 
setting the first derivative equal 


g with respect to a, and 
to zero; thus 


Zi — ar)(—22) = 0 
After simplification the normal equation results, 


Zr 
Zm SaZa, or, a = eae 
The needed values can be com: 
93 in the following equations: 


Za? = DX? — car 


puted by substituting the raw data in Table 


LINEAR REGRESSION ` 235 


Za? = 47,640 — 2" Sen 


Say = zxy - ZDEN 


zzy = 110,600 — 620.600) _ 200 


Therefore, 
200 
Bea "2 
The regression equation in deviation form is then 
y = 6%x 


If it were desirable to change the equation to raw score form, a very 
simple method is available. Since deviation scores have been defined as 
y = Y — Y and z =X — X, the equation y = az can be changed to 


(Y — Y) =a(X — X) 
Substitution of the data yields 
1,600\ _ ao _ 690 
(v= Spr) eis 10) 
Y = 62X — 300 


Thus the deviation score method and the raw score method have produced 
the same results. 


The advantage of computing a single variable regression equation in 


deviation form can be found nob only in that a single normal equation is 


necessary, but also in that the sum of squares due to regression and re- 


siduals can be easily calculated. Since the sum of squares due to regression 


is equal to aZxy and since 
Zo 


o 
Za” 


then the sum of squares for regression is equal to 


Za y)? 
Er 


The complete analysis of regression is shown in Table 97. 


For the example at hand, 
zY)? (1,600)? _ 
For total, SIS, =2 = oe = 263,200 — =p = 7,200 
Say)? _ (200)? — 
For regression, S.S. = Cap sca a 1,33314 


, (aay)? 200)? 
SE KS gat = 7,200 — D" = 5,806% 


236 STATISTICAL METHODS 


TABLE 97. Deviation Score Entries in Analysis of Regression Table 


Vereen eege savans square 
Regression 1 Ss SC 
(Ezy)* i a ae 
Residuals N—2 zy? — GC ee 
Total Mt zy Që 


It should be noted that these values are identical with those in Table 96. 


It is interesting to note that the entries in Table 97 can be expressed 
in terms of the coefficient of correlation since 


a Zu 
" (VZV Z) 
After squaring, the equation becomes 
Ee 
. (22?) (Zy?) 
the entries in Table 97 can be changed to those 


zy 


By means of this equality, 
in Table 98. 


TABLE 98. Entries in Terms of the Coeficient 


of Correlation in the Analysis 
of Regression Table 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Regression 1 (2) (2y2) (r2,) (24°) 
Residuals N—2 a — 72,) (ayy L= ra) (2y) 

N—2 
Total 


(1) (29) DN — 2) 
Q@ = 72)@y) V (1-2) 
N—2 


The t-value in the foregoing equation has N — 2 degrees of freedom. 
This equation illustrates that the analysis of regression leads to a test 
of the significance of the difference between an r-value and zero, The 
t-equation affords such a test for any r-value without resorting to the 
computation of an analysis of regression. 


LINEAR REGRESSION 237 


To simplify the test even more, r-values at the 5 per cent and 1 per 
cent levels of significance at various degrees of freedom have been com- 
puted from the foregoing formula and tabulated as shown in the Ap- 
pendix. By means of this table, testing the significance of the difference 
between any r-value and zero can be made by merely noting whether 
the table value for the appropriate degrees of freedom is larger or 
smaller than the value being tested. 


MULTIPLE REGRESSION 


The principle of linear regression may be readily applied to more than 
one prediction variable. The regression equation may be expressed 


Y = Xr + SE Has t +.“ + QnXm tC 

In most educational problems, because of the intercorrelation of variables, 
little advantage is gained by the addition of variables beyond three or 
four, and even then the number of cases must be large to yield greater 
forecasting efficiency than can be had by using one or two variables. 

For the purpose of illustration, it is assumed that the problem consists 
of predicting a criterion Y from three prediction variables Xi, Xo, and Xs. 
The regression equation is, therefore, 

Y ss ont aX: + a3X3 + C 
As in the case of single variable regression, values of a1, e, Ga, and C are 
desired such that the sum of squares of the errors between actual and pre- 
dicted Y-values is a minimum. The usual equations can be obtained by 
differentiating with respect to a1, 42, %, and C respectively, the expression 
AY = aX, — 2X2 — aX; — C} 


When each of the first derivatives is set equal to zero, the resulting equa- 


tions are: 
(Y — 4X1 — 2X2 — aX: — C)(—2X1) = 0 


ZY — aka — a Xa — 3X3 — C)(—2X2) = 0 
ZY -aki— asXa — a3X3 — C)(—2Xs) = 0 
ZY — aX — Xa — 4X3 — c)(-2) =0 
After simplification, the equations become 
IXY = DXi + Ehe a2X:X: + CEM 
210 = qëXiXa Loi + 2x2 X + CX 
IXY = ga + aXX: + aXX + CEX; 
ZY = uX, + EX: + uX: + NC 
In any given problem, the normal equations may be solved simultaneously 


for values of a, a», as, and C. 


In the analysis of regression, the sum of squares may be broken into 


238 STATISTICAL METHODS 


tyvo portions, that explained by the regression and that explained by the 
residuals. Once again the sum of squares for total, residuals, and Tegres- 
sion have spatial counterparts, although pictorial representation is diff- 
cult or impossible when more than two prediction variables are considered. 
Insofar as the calculations are concerned, the sum of squares for total 
is, as usual, 


S.S. Total = ZY2 — a 


The sum of squares for residuals is then 
S.S. Residuals = 3Y2 — GER — aëX,Y — @ZX3Y — OY 
The sum of squares for regression may be had by subtraction or by the 
formula 
2 

S-S. Regression = GEKY La EA + att + Cay — er 
The analysis of variance may be arranged as in Table 99. Mean squares 
may be found by dividing the sum of squares by the appropriate number 


TABLE 99. Raw Score Entries in Analysis of M ultiple Regression 


SOURCE OF DEGREES OF 
VARIATION FREEDOM 


SUM OF SQUARES 


m GER ot Lab + Cay — en 


Residuals N-m-1 ZY2 — aX, Y — XY — qëXjY — CZY 
Total N-1 ZY: — cry 


In many cases the coefficient of multiple correlation is de 


rion as well as 
the interregressions among the variables, Nonlinear Tegressions may be 
rectified, however, and utilized in the foregoing formulas, 
As an example of multipl 


e regression, the information given in Table 
100 may be used. Algebra review examination scores are to be predicted 


LINEAR REGRESSION 239 


TABLE 100. Algebra Review Examination Scores, High School Grade-poii 
Averages, and Scholastic Aptitude Scores of Taney po College Se F 


APT. H. S. AV. ALG. APT. H. S. AV. ALG. 
STUDENT Kg X: Y STUDENT XxX Ke Y 
1 57 3.00 27 12 24 2. 
2 93 2.85 34 13 TË pës 3 
3 79 3.20 27 14 88 3.72 40 
7 26 2.49 24 15 57 2.65 23 
5 69 3.07 35 16 20 2.56 20 
6 24 2.38 18 17 3 2.97 25 
7 76 3.74 33 18 54 3.24 21 
8 61 2.62 39 19 64 1.84 19 
9 82 2.53 35 20 76 2.54 17 
10 29 3.17 25 21 89 3.66 35 
11 36 3.24 29 22 18 2.80 17 


from scholastic aptitude scores and high school grade-point averages. 
Tt should be noted that two prediction variables only are here considered, 
although prior reference has been utilized where three prediction vari- 
ables were considered. The values necessary for the solution of the nor- 
mal equations may be computed and are as follows: 


N = 2 
ZY = 569 ZY” = 16,137 EX,Y = 32,603 
IX, = 1,132 IX? = 75,570 EX2Y = 1,679.50 
ZX = 62.94 XX= 185.1656 3X,X2 = 3,856.20 


The normal equations are found in the usual manner. The expression 
Z(Y — aX, aa — C? 
is differentiated with respect to a1, a», and C respectively. Each of the first 
derivatives is set equal to zero and simplified such that 
DX VY = aDX? + 043X,X> + CIN 
IXY = XXX: + |EXZ + C2X: 
ZY = ðX: + 2X: + NC 
Upon substitution, the normal equations are 
32,603 = 75,570a1 + 3,356.20a: + 1,1320 
e 1,679.50 — 3,356.2001 + 185.1656a2 + 62.940 


569 = 1,132a1 + 62.9444 + 22C 
and C, the values obtained 


Solving the simultaneous equations! for ai, a, 


are 
aj = 0.14607215 ` ez 6.7563390 C = —0.98171142 


The multiple regression equation is, therefore, 


24 complete solution of a multiple regression problem is given in the Appendix. 


240 STATISTICAL METHODS 


Y = 0.14607215X, + 6.7563390X2 — 0.98171142 


from which the analysis of regression can now be computed. 
The sum of squares for total is 


zy? — Ez = 16,137 — "2 S 


which yields 1420.59 and which is entered in Table 


101. The sum of squares 
for regression is 


a 
aX Y Lat + CZY — WR 
or 


(32,603) (0.14607215) + (1,679.50) (6.7563390) — (569) (0.98171142) 


2 
= pr — 834.66. 


The sum of squares for residuals is obtained by subtraction, i.e., 


(1,420.59 — 834.66) = 585.93 


The number of degrees of freedom for total is (N — 1), or 21; for re- 
gression it is the number of prediction variables, or 2; and for residuals 
it is the difference or (N — m — 1), or 19. The mean squares are found 


by dividing the sum of Squares by the appropriate number of degrees 
of freedom and are shown in Table 101. 


TABLE 101. Analysis of M ultiple Regression of Algebra Scores 


SOURCE OF DEGREES OF SUM OF MEAN 

VARIATION FREEDOM SQUARES SQUARE 
Regression 2 834.66 417.33 
Residuals 19 585.93 30.84 
Total 21 1,420.59 


417.33 834.66 
Peso = 4/2233 — 13.53 kas | E 
ae 30.84 Ruam AE pez 


F is the ratio of regression mean si 
13.53 with 2 and 19 degrees of free 
cent level. 

The coefficient of multiple correlation may be computed by extracting 
the square root of the ratio of the regression sum of Squares to the total 
sum of squares and yields Rya, = 0.767. 

The standard error of estimate is the si 
for residuals, or 5.55. Therefore 
the foregoing regression equati 


quare to residual mean Square and is 
dom. It is significant beyond the 1 per 


quare root of the mean square 
, Prediction of algebra scores by means of 
on will not be Wrong by more than 5.55 


LINEAR REGRESSION 241 


points in approximately two-thirds of the predictions one might make. 

As in the case of single variable regression, the deviation score method 
can be used to calculate the prediction equation for multiple regression. 
The general equation in deviation form differs from the equation in raw 
score form in that the C term has again disappeared. Thus the number 
of normal equations necessary has been reduced by one. Whereas the 
raw score computations require one more normal equation than the num- 
ber of prediction variables present, the deviation score method requires 
the same number of normal equations as prediction variables used. This 
labor-saving aspect is the principal advantage of the deviation score 
method over the raw score method, particularly as the number of predic- 
tion variables increases. 

In the two-variable regression considered in the foregoing discussion, 
the regression equation in deviation form is 

Y = gon + gp 
When the expression 2(y — diti — Q2t»)? is differentiated with respect to 
a, and a, respectively, and each of the derivatives is set equal to zero, 
the resulting normal equations are, 
Day = ajdai + Mat 
Day = a Sav, + ama 

The sums of squares of the deviations from the mean can be calculated 
from the raw scores in the following manner: 


Za? = DXi — oxy = 75,570 — ie = 17,323.455 


(62.94)? _ 
a pl 


2 
Zak = ZXË — ox = 185.1656 — 
exe") = 32,603 — 1 IS 569) £ 3,325.364 


Izy = ZXY — 


SE — exen = 1,679.50 — HES 289) 251 648 


Lay = 


a= BEE Eros = 3,356.20 — MIA — 117.65 
ese values into the normal equation yields 
3,325.364 = 17,323.455a1 + 117.65a2 

51.643 = 117.65% + 5.10a2 
ons for aj and as gives essentially the 
thod: 
ag = 6.7563946 


‘orm is therefore, 


Substitution of th 


Simultaneous solution of the equati 
same values found by the raw score me 
a, = 0.14607214 


The regression equation in deviation f 
y= 0.1460721421 + 6.756394612 


242 STATISTICAL METHODS 


To convert to raw score form the difference between the raw score and 
its mean is substituted for each deviation score such that 
(Y — Y) = a(X— Xi) + a(X: — Kéi 
or 
1,132 62.94 
(z > =) = 0.14607214 (x: Ce sl + 6.7563946 ( qe SC 


Y = 0.14607214X, + 6.7563946X, — 0.98186772 


The sums of squares for regression and residuals may be computed from 
the formulas: 


S.S. Regression = UTY + Ury 


= (0.14607214)(3,325.364) RS (6.7563946) (51.643) 
= 834.66 


S.S. Residuals = Zy? — azy — ELY 


= 1,420.59 — (0.14607214) (3,325,364) 
= (6.7563946) (51.643) 
= 585.93 


In this manner the values shown in Table 101 have 
As in the case of single vari 


R, 5 eg A E AE 
(1,2, +++ m) zy? 


where m is the number of prediction variab 


les, can be used to change the 
table entries in the analysis of multiple 


Tegression as shown in Table 


TABLE 102, Entries in Terms of M ultiple Coefficient of Correlation in 
Analysis of Multiple Regression 


SOURCE OF DEGREES op SUM or MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Regression m Rizy” Rap 
m 
Residuals N=m-1 (1 — Ray Q = Rzy 
N-m—1 
Total N-1 zy 
Rzy? 
m PAN — m — 1) 
amaa = p—s TAN = m — 1) 
Rates ae ROZË” "et BH 


LINEAR REGRESSION 243 


Pn dj of e difference between a multiple co- 
of regression directl: b pik sh ee CET the analyn 

: rectly or by using the P-equation derived from that anal- 
ysis. In many instances testing such a hypothesis is of incidental value 
because the R-values obviously differ greatly from zero. 

It should be noted, that when more than one variable is used to predict 
a criterion, the relative influence of each of the prediction variables with 
respect to any other cannot be inferred from a direct comparison of the 
size of the coefficients of the variables. 

If relative importance of variables in the prediction of a criterion is 
desired, such information is available from the solution for the sums of 
squares for regression available from the regression analysis. As seen in 
Table 102, the sum of squares for regression represents the proportion 
of the total variance accounted for by the use of a battery of variables. 
The contribution of each may be noted from the example shown in Table 
100 from which the sum of squares for regression became 

(0.14607214) (3325.364) + (6.7563946) (51.643) 

or 485.74 + 348.79 = 834.53 

Since the total sum of squares was found to be 1,420.59, the proportion 
of the variance associated with the use in the prediction of these examina- 
tion scores from scholastic aptitude scores and high school grade-point 
averages was 0.5875. The proportion associated with the former was 
0.3419 and with the latter 0.2455, or a ratio to each other approximating 
7 to 5. 

Usually, but not always, all terms in arithmetic form to be added will 
carry a positive sign. Whenever a negative term appears, the relative 
contribution of each variable to the total sum of squares for regression 
must be modified. The proportion of the variance accounted for by re- 
gression must be divided proportionately to the absolute values (sign 
disregarded) of the terms. Thus, if the sums of squares for regression 


aay + azZasy + Ery 
in some solution turned out to be 
60 + 80 — 20 
the sum of squares for regression is 120 which must be divided in the 
proportion each absolute number is of the absolute sum of 160. Thus, in 
terms of sums of squares, the relative contributions are, 


a 60 
For first variable: 160 (120) = 45 


For second variable: x (120) = 60 


For third variable: a (120) = 15 


244 STATISTICAL METHODS 


Care needs to be taken in the interpretation of relative importance, 
The mathematical procedure here suggested must be in terms of the vari- 
ables themselves, not in terms of the characteristics which the variables 
are presumed to measure. NM 

In some situations a decision is necessary concerning the advisability 
of adding additional prediction variables. Valuable information can often 
be obtained by determining which prediction variable or variables of a 
series of variables used in a regression contribute greatly to the efficiency 
of the prediction. A method by which this information can be obtained 
is illustrated by an extension of the foregoing example. 

After predicting algebra scores from high school grade 

and scholastic aptitude scores, the investigator may bi 
knowing whether a significant loss in ability to predict al 
be incurred if the scholastic aptitude scores are eliminat 
diction scheme. To secure this information the equati 
algebra scores from high school grade 
culated. This equation may be found 
multiple regression by the raw score o 
case of the raw score method, the n 
become 


-point averages 
e interested in 
gebra scores will 
ed from the pre- 
on for predicting 
-point averages alone must be cal- 
from the values used in solving the 
r the deviation score method. In the 
ormal equations, upon substitution, 


1679.50 = 185.1656a, + 62.940 


569 = 62.94a, + 22C 
Simultaneous solution of these equations for y, 


alues of az and C gives the 
regression equation 


Y = 10.126060X, — 3.106101 
The sum of squares for regression is 


(10.126060) (1679.50) — (3.106101)(569) — Dër — 522.94 


This information can now be combined with that in Table 101, as shown 
in Table 103. 


TABLE 103. Testing the Loss Due to the Elimination of 
Scholastic Aptitude Scores 


—T A a a E 


DEGREES OF SUM OF MEAN 
SOURCE OF VARIATION FREEDOM SQUARES SQUARE 
Two-Variable Regression 2 834.66 
H. S. Average Regression d 522.94 
Loss Due to Elimination of 
Aptitude Scores f 1 311.72 311.72 
Two-Variable Residuals 19 585.93 30.84 
Total 21 1420.59 


( This may also be designated as advantage of adding aptitude scores. 


LINEAR REGRESSION 245 


It is apparent that the sum of squares for multiple regression can be 
subdivided ¿nto two parts, one associated with the high school grade-point 
average regression and the other associated with the loss due to the elimi- 
nation of the aptitude scores from the regression scheme when predicting 
algebra scores. It is now possible to test whether this loss is significant. 
For this test, 


Since this F-value is significant, there is an appreciable loss in ability to 
predict algebra scores when aptitude scores are eliminated from the re- 
gression. 

The investigator may also be interested in knowing whether a signifi- 
cant loss in ability to predict algebra scores will result from the elimina- 
tion of the high school grade-point averages from the regression. To 
obtain this information the foregoing method of calculating the single 
variable sum of squares due to regression must be repeated, this time 
using the aptitude scores as the prediction variable. The results are 
shown in Table 104. A significant loss in ability to predict when the high 


Tani 104. Testing the Loss Due to the Elimination of 
High School Grade-Point Averages 


DEGREES OF SUM OF MEAN 
SOURCE OF VARIATION FREEDOM SQUARES SQUARE 
Two-Variable Regression 2 834.66 
Scholastic Aptitude Regression 1 638.33 
Loss Due to Elimination of 
H. S. Averages 1 196.33 196.33 
Two-Variable Residuals 19 585.93 30.84 
Total 21 1420.59 
196.33 
SE 6.37 
Fino 30.84 


school grade-point averages are eliminated is evidenced by the F-value 


of 6.37 which is significant. 


It is possible to identify the difference between the sums of squares 


for the two-variable regression and the single variable regression in two 
ways. This difference may be considered to be the advantage of adding a 
variable or the loss due to the elimination of that variable. In the former 
case the significant F-values of Tables 103 and 104 indicate that the 
advantage of adding cither variable would be appreciable. 

The deviation score calculations of the loss due to the elimination of a 
variable from a prediction scheme are simpler than the raw score calcula- 


246 STATISTICAL METHODS 


tion because fewer normal equations are involved. In the present example 
in which single variable regressions are involved, the previously developed 
formula which shows the sum of squares for the regression to be equal to 
(Zay)? 
Za? y 
Therefore, the sum of squares for the hi 
regression 


can be applied. 
gh school grade-point average 


= Gra)” _ (51.643) — s 
2 5.10 922,94 
The sum of squares for the scholastic aptitude regression 


(zy)? (3325.364)? 
zë 17323.455 — 088.33 


With these values Tables 103 and 104 can be reconstructed. 
Testing whether a significant loss in ability to predict has occurred 
when one or more variables have been dro 


coefficients between the criterion and 


ages and aptitude scores respectively, onh ayere 
High school averages: Ta = pë = 
Scholastic aptitude scores: fa = vis = 0.67 
Both variables: Raa = Vis =/0:77 


The multiple correlation coefficient, R, i 


s never smaller than any of the 
zero order coefficients since each additi 


onal prediction variable in a re- 


eeh 


emp — ie E ED — 
Mu e Ry — red)” = 7 a A 
IO — EDIT] 
AR I-N 9L 
T-=U-N 

digjem tj del" — 1] t-u- yN S[ENPIEY AQB A-U 

EE bs: (E 
Te yj ae — La u (squire, Jo gepennt 09 ONC sept 
gjize ED u— u uorssaBoy QUIE A (u — 4) 

dësen “E Dën u 


UOISSo12oJ IQUBAL 


Kéi 
S Kran SHUVADS A0 WAS NOGAHUA 40 SAAUDAA NOILLVIUVA A0 ADUNOS 


Sisfippuy (8)ajqo204 uonorpasq fo UOYDUVUNTM 07 ANC] SSO IY} U2 U04020) fo suo) fo sua y, ur sou “GOT AAV, 


247 


248 STATISTICAL METHODS 


gression equation increases the ability to predict the criterion. This 
increase may vary greatly in size. 

On occasion it is profitable to find partial correlation coefficients, i.e., 
the correlation coefficient between two variables when the individuals 
concerned are held constant with respect to one or more additional vari- 
ables. Partial correlation coefficients can be found fr 


om the analysis of 
regression by applying the formula 


SS due to loss by elimination of Variable 1 
er Va due to loss by elimination of Variable 1 + SS of Multiple Residuals 
The left-hand side of the equation is interpreted as the partial correla- 
tion between the criterion Y and the prediction variable Xi, while all 
other prediction variables are held constant. The partial correlation co- 


efficient between algebra scores and scholastic aptitude scores with high 
school grade-point averages held constant is 


e aA a 
"a2 = 31172 + 585.93 7 0-59 


The partial correlation betw. 


een algebra scores and high school grade- 
point averages with aptitude 


score held constant is 
N A 

S 196.33 k 
pu dech + 585,93 T 0.50 


'The sign of both partial coefficients of correlation is positive since the 
sign always is the same as that of the coefficient of the 


à appropriate 
variable in the two-variable regression equation. 


i b usions from the latter 
test agree with the analysi i 
significance of any partial correlation coeffici 


a Ty1-2,.mV N — m — 1 
VI-të 


L2,-m 
where m is the number of 
of freedom are N — m — 1. 


Using this formula to evaluate the significance of the partial correlation 
coefficient between algebra scores and scholastic aptitude scores with high 
school grade-point averages held constant, 71.2, the formula becomes 


prediction variables involved and the degrees 


LINEAR REGRESSION 249 


and for 72.1 the formula becomes 

 0.50V22-—3 _ 

VI — (0.50)? 

As was true of the F-values in the analysis of regression both t-values 
with 19 degrees of freedom are significant. These values are identical, 
except for errors in rounding numbers, with the square root of the appro- 
priate F-values for loss due to dropping a variable as shown in Tables 
103 and 104. 


PARTIAL CORRELATIONS FROM CORRELATIONS 
OF LOWER ORDER 


Partial correlations of any order may also be computed from correla- 
tion coefficients of lower order. It will be recalled that the correlation 
coefficient between two variables is designated as a zero-order correlation 
coefficient, since no variables are held constant. First-, second-, third- 
order partial correlations are obtained by holding constant one, two, 
three variables, and so on. 

Returning to the illustration of algebra test scores predicted from high 
school grade-point average and scholastic aptitude scores, the zero-order 
correlations za and ze have been shown to be 0.67 and 0.61 respectively. 
One other zero-order correlation coefficient is available from these data, 
i.e., 712, or the correlation coefficient between aptitude score and high school 
grade-point average. Since this correlation was not involved in testing 
hypotheses about the criterion, it would be computed only if the investi- 
gator were interested in relationships between the prediction variables or 
for the purpose of obtaining further partial correlation coefficients. The 
value of 712 may be obtained by substituting the values for 22,1», Zaz, 


and 223 into the formula 


2.52 


Se Ze 
VE) 
and is found to be ER 
VI 
= 0.40 


ne = /(6.10)(17323.445) 


For problems involving & criterion and two prediction variables with 
any one of the three held constant, the three possible first-order partial 
correlation coefficients may be expressed by the formulas: 

Se ty — Dallnz 
ne T VO = mh) = Aa) 
SE WAS Dal (ri 2) 
E) 
fist (ra) (ry 


na = 70 — Ya — ra) 


250 STATISTICAL METHODS 


Substituting the appropriate data from the illustrative problem into the 
formulas they become 


0.67 — (0.61)(0.40) 059 


2 Vil Wenn — (040) 

_ 0.61 — (0.67 (040) — 
Ten VE — een — (040) °° 
ma, = — 040 — (0.67)(0.61) BS 


VE — (0.67)[11 — (0.61)?] 
The general formula for partial correlations of any order obtained from 
correlations of lower order is 


Ty1-2,3 «++ (m—1) = (rm23 wm- n) (Tim:2,3 +++ (m= ») 

VIL = rim2a-m—n][L — times m-n] 
It should be noted that the order of the partial correlations to the right 
of the equality sign are always of one order lower than-the partial cor- 
relation to be computed. 

The use of the formula for obtaining partial correlations of various 
orders will be shown for the data in Table 106. The data in this table 


Ty1-2,3:=m 


TABLE 106. Relationship of Various Factors to Teaching Success of 
100 High School Teachers (Coefficients of Correlation) 


be 


VARIABLE 1 2 3 4 5 Y 
1. Intelligence .693 .521 .269 074 .269 
2. General Academic Success .895 420 .269 .069 
3. Academic Success in Major 522 261 —.048 
4. Methods Course Mark 204 022 
5. Student Teaching Mark 215 
Y. Teaching Success 


E RI eeeSsSsses 


were collected for the purpose of ascertaining how well teaching success 
could be predicted from data gathered about the prospective teacher 
while still in college. With these 100 teachers the zero-order coefficient 
of correlation found between teaching success and intelligence was 0.269, 
and between teaching success and student teaching mark was 0.215. If 
the relationship between teaching success and student teaching mark with 
intelligence constant is desired, the formula used is as follows: 


Tus — Ti gut 


~ Vil -AA A] 


Ty5-1 
Substituting 
0.215 — (0.074) (0.269) 


"wi = VE — wann — (02698) " 05 


TI 


LINEAR REGRESSION ` 251 
In a similar manner the other first-order partial correlations in which 
intelligence is held constant may be obtained as follows: 
nau = —0.054 7352 = 0.261 Tyga = 0.203 r3y1 = —0.229 
T231 = 0.868 Ta 0.336 7.51 = 0.285 
mai = 0.465 ter = —0.169 725 = 0.303 


If second-order partial correlations are desired between teaching success 
and student teaching mark with intelligence and general academic suc- 
cess held constant, the formula to be used is 


Tus-1 — 12,6-17 y2-1 


TM — hall — real 


Tys.1,2 


Substituting, 
5 SA 0.203 — (0.303) (—0.169) = 0.271 
"s12 YT — (0.303) 511 — (—0.169)"] 
Other second-order, together with third-order and fourth-order partials 
may be solved in a similar fashion and will be found as follows: 


Second-order: Ty = 0.0038 Tseng = 0.871 og = —0.168 

rage = —0.004 Taz = 0.271 M512 = 0.204 
Third-order: roms = 0.071 "usas = 0.274 Tasas = 0.221 
Fourth-order:  7ysazas = 0.266 


MULTIPLE CORRELATIONS FROM ZERO-ORDER 
AND PARTIAL CORRELATIONS 


As was true of partial correlation coefficients, it is also possible to 
obtain multiple correlation coefficients from known values of zero-order 


and partial correlations. 
With three variables, the formula for Rya.» is 
Raa = VI- A-AA re) 


Which can also be written as 


vu + T2 — Mrt 
Ryan = La 


Substituting the data from the problem of predicting algebra scores from 
known values of scholastic aptitude and high school grade-point aver- 
ages, the formula becomes 


Rya» = VI = [1 — (0.67)"][1 — (0.50)] = 0.767 
ed a. a ae a: 
Rua» = , [802 + (0.61) — 2(0.67) (0.61) (0.40)] — 0.767 


1 — (0.40)? 


252 STATISTICAL METHODS 


The general formula for obtaining multiple correlation coefficients from 
zero-order and partial correlation coefficients with any number of vari- 
ables involved is 


Rosen 

=vi-[a— Tn) (1 — r21) (1 — 12) (1 — rana) -+ (1 — mi2 -m)] 
where m is the number of prediction variables involved. 

Returning to the example of the prediction of teaching success from 
known values of intelligence, general academic success, academic success 
in major, methods course mark, and student teaching mark, if a measure 


is wanted indicating how well teaching success can be predicted from 
known characteristics, the formula will be 


Rya23,.5 
=vV1= [A = (1 — 121 — rës) — ríes) (1 — 7351,2,3,4)] 
Substituting and solving for R, 


Ryq,2,2,4,6) = 0.436 


It has been shown that two approaches to problems involving partial 
and multiple correlations are possible and that except for errors in round- 
ing, the methods are equivalent. The distinct advantage of the analysis 
of regression approach lies in the ease of computation and amount of time 
saved, particularly when deviation scores are used. In problems involy- 
ing more than three variables, the reduction in the time required for 
computation is increasingly large. There may be times, however, when the 
investigator is interested in descriptive relationships involving only a few 
variables, and the latter approach may be utilized. 


Exercises 


1, A university instructor wished to predict final marks in a freshman English 
course from one or more of the following variables: raw scores on a scholastic 
aptitude test, high school grade-point averages, and scores on a pretest in English 
grammar. Data for 45 freshmen were tabulated. 

a. Compute all possible zero-order coefficients of correlation. On the basis 
of these values, which prediction variable will be the best predictor? 

Which combination of two variables do you think will be the most 

effective? Why? 

Find the regression equation relating the criterion, Y, with a combination 

of all three variables. Use either the raw score or deviation score method. 

c. In the foregoing three variable prediction scheme, which of the three 
variables contributed the most to the effectiveness of the prediction? 
Why cannot the effectiveness of each variable be inferred from a direct 
comparison of the respective coefficients of the variables? 

d. Test the null hypothesis that final marks in freshman English cannot 
be predicted from a combination of scholastic aptitude test scores, high 
school grade-point averages, and English grammar pretest scores, Com- 
pute and interpret the multiple coefficient of correlation. 


> 


LINEAR REGRESSION 253 


e. Compute the standard error of estimate. Interpret your result. 

f. Would a significant loss in ability to predict result if the scholastic 
aptitude test scores (X,) were eliminated from the prediction battery? 

g. Find 7,1.23. Interpret your answer to the foregoing question in terms of 
this value. 

h. Would a significant loss in ability to predict result if the English pretest 
scores (X,) were dropped from the three variable prediction equation? 
Compute ry3.1,2- 

i. If you determine the regression coefficients by solving the normal equa- 
tions simultaneously, is any advantage gained by designating a prediction 
variable which is likely to be dropped from a multiple regression equation 
as X,? 

j. For all practical purposes, would a single prediction variable scheme 
involving only the high school grade-point averages be just as effective 
in the prediction of freshman English final marks as the three variable 
prediction equation? Can you interpret the necessary test of significance 
in terms of a partial coefficient of correlation? 

FRESHMAN ENGLISH SCHOLASTIC HIGH SCHOOL 
FINAL MARKS APTITUDE GRADE-POINT ENGLISH 
(PER CENT) SCORES AVERAGES PRETEST SCORES 
STUDENT Y Xi Xa X: 
AAA 4Á4eÁ KK 

1 76 132 2.88 21 
2 92 147 3.73 40 
3 68 67 2.24 14 
4 60 128 2.46 18 
5 76 103 2.70 26 
6 88 108 2.17 13 
7 72 101 2.53 20 
8 88 125 3.23 28 
9 96 129 3.53 30 
10 96 127 2.59 15 
11 64 93 1.44 15 
12 96 142 3.25 39 
13 88 113 2.81 15 
1 40 93 1.81 8 
15 96 105 2.88 20 
16 52 99 2.50 13 
17 68 124 2.47 27 
18 68 105 2.58 25 
19 44 144 2.16 20 
20 92 106 3.13 22 
21 52 100 2.84 22 
22 80 98 3.43 13 
23 52 83 2.16 21 
24 92 99 2.60 31 
25 76 107 2.33 13 
26 92 131 2.83 29 
27 76 98 2.69 20 
28 88 117 2.64 33 
29 88 156 3.53 26 
30 88 117 3.03 25 


STATISTICAL METHODS 


254 
FRESHMAN ENGLISH SCHOLASTIC HIGH SCHOOL 
FINAL MARKS APTITUDE GRADE-POINT ENGLISH 
(PER CENT) SCORES AVERAGES PRETEST SCORES 
STUDENT dÉ Ni X: Xa 
31 96 105 2.59 34 
32 72 108 2.96 E 21 
33 64 94 2.78 8 
34 96 152 3.75 43 
35 48 115 2.13 12 
36 76 132 2.75 15 
37 88 90 3.71 24 
38 88 104 3.37 24 
39 84 129 3.31 18 
40 88 102 3.38 12 
41 48 98 2.28 20 
42 80 124 2.20 21 
43 96 121 3.61 37 
44 92 129 3.41 23 
45 64 104 2.13 18 


2. In an attempt to predict scores (Y) on an application of principles final 
examination in a general psychology class, the instructor compiled reading 
comprehension scores (X4), the linguistic scores on the American Council on 
Education Psychological Examination (X>), the quantitative scores on the same 
test (X), and prior college achievement scores (X4). The values for 71 students 
are summarized below: 


ZY = 4,274 ZY: = 274,034 EX,Y = 109,495 ZE = 77,555 
EX, = 1,690 XX? = 52,007 ZXY = 290,969 EX¡Xy = 4,435.88 
ZXa = 4,653 ZX3 = 320,790 ZXY = 191,720 EXXX; = 207,676 
EXX; = 3,095 EX? = 141,305 ZXY = 11,175.58 XXX, = 11,939.56 
ZX, = 180.18 2Xł= 483.6436 2X,X, = 116,627 Gah, = 7,947.09 

a. Find the multiple regression equation between Y and a combination of 
all prediction variables. Compute and interpret an analysis of multiple 
regression. 

b. Determine whether it is possible to drop the reading comprehension 
scores from the four variable prediction regression equations without 
incurring a significant loss in ability to predict. Was there an appreciable 
decrease in the size of the multiple coefficient of correlation? 

c. Systematically test the prediction variables in the multiple regression 


equation so as to reduce the equation to a combination of the prediction 
variables which individually cannot be dropped without causing a signifi- 
cant loss in prediction efficiency. 


3. An investigator found the following intercorrelations between final marks 
in a beginning sewing class, finger dexterity test scores, paper and pencil pretest 
scores, and performance pretest scores. N = 165. 


LINEAR REGRESSION 255 


FINAL FINGER PAPER-PENCIL PERFORMANCE 
MARKS DEXTERITY TEST PRETEST PRETEST 
VARIABLES Y x X: Xs 
Final Marks —0.293 0.446 0.415 
Finger Dexterity Test —0.039 —0.113 
Paper-Pencil Pretest 0.640 


Performance Pretest 


a. Find ryr.a, Ty1-3) Tat, Ty23 Tute ANA Ty3-2- 

b. Find 71.23, Ty21,3, 753:1,2. Judging from these values, which prediction vari- 
ables, if any, would you attempt to drop from the battery if you wished to 
reduce the number of prediction variables? 

c. Find Rya2, 


14 


Serial Correlation and 


Discriminant Analysis 


Both regression equations and coefficients of correlation have been 
discussed as suitable indications of linear relationship between two nu- 
merical variables. In these procedures the assumption has been made that 
both the criterion and the prediction variables are available in such de- 
tail that no two observations of identical magnitude in either variable 
are obtained. Usually no such precision of measurement is available. 
Fortunately, some duplication in magnitude among observations is of 
little importance unless the number of reported magnitudes is less than 
five or ten. Thus, the methods of correlation and regression analysis 
previously described are usually satisfactory. 

There are, however, many situations in educational and psychological 
research in which the observations, of necessity, are coarsely grouped in 
one or both of the variables needed for correlation or regression analysis. 
Perhaps the most frequently appearing situation is one in which one 
variable is numerically expressed and the other unfortunately appears 
in two segments. Discriminant analysis and biserial correlation are fre- 
quently needed by the research worker for the analysis of such data. The 
former follows the methods of regression analysis whereas the latter 
attempts to yield an estimate of the relationship between two variables 
when one occurs in a dichotomy. 

Discriminant analysis then may be treated as a regression analysis 
with similar necessary assumptions. Like all regression analysis, one or 
the other of the variables must be chosen as the criterion and the other, 
or others, used as prediction variables. This analysis will provide suit- 
able tests of significance. On the other hand, if a positive indication is 
needed for describing the relationship between the two characteristics, 
it is unnecessary to distinguish between the criterion and the prediction 
variables. For the purpose of obtaining a coefficient of correlation which 
can be interpreted as the usual product moment correlation is interpreted, 

256 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 257 


a suitable formula is needed which yields a coefficient similar to that 
obtained when both variables are numerically expressed in not too few 
segments. Although existing formulas are not entirely satisfactory for 
biserial correlation, only in unusual situations will violence occur in in- 
terpreting a coefficient of biserial correlation as a product-moment coeffi- 


cient. The method for computing a biserial coefficient will be considered 
later. 


ASSUMPTIONS UNDERLYING BISERIAL 
CORRELATION 


Several assumptions are necessary in the development and application 
of the necessary formulas. Probably the most important of the assump- 
tions is that the dichotomous characteristic is actually a normally dis- 
tributed variable. A few examples will bring this assumption into bold 
relief. Some high school graduates go to college and some do not. At first 
thought this classification represents an either-or situation. On the other 
hand, it is not unreasonable to assume that the tendency to go to college 
is a characteristic that is normally distributed with the tendency being 
So great that actually 20 per cent enroll in college and 80 per cent do not. 
With some individuals the tendency is so neutral that it is difficult to 
forecast the group in which they will probably be found. The problem 
of indicating the relationship between intelligence quotients and tendency 
to go to college in which the latter has been dichotomized into either 
attendance or nonattendance does not apparently violate the assumption 
of a normally distributed variable dichotomously expressed. 

Another important use of biserial r is in test item validation in which 
the criterion of validity is considered to be internal consistency, i.e., the 
relationship of total score to a correct-incorrect response to any given 
item, Here again, on first thought, the correct-incorrect response appears 
to be disqualified as a continuous variable. However, there seems little 
Justification for the inclusion of an item in a test except for the purpose 
of estimating some student characteristic that, without serious imaginary 
Perspective, can be assumed to be normally distributed. Justification is 
lacking for including any test item which is not designed to estimate, 
Toughly though it may be, some variable characteristic which the total 
Score presumably reveals. 

An item-by-item computation of biserial r with total score, then, re- 
veals some evidence with which a normally distributed characteristic 
is being measured by a crude response such as a correct-incorrect classi- 
fication of response. The assumption of a dichotomous response to a 
normally distributed variable in item analysis does not appear to be un- 
tenable unless the generally accepted idea of normality in distribution 
of Psychological traits is rejected, which the authors are unwilling to do. 

erhaps one of the greatest single educational uses of biserial r is asso- 


258 STATISTICAL METHODS 


ciated with the issue of attrition-survival of students in an educational 
program or an individual course. Nor is the attrition problem unique to 
education. The armed forces, as well as business enterprises, have been 
keenly aware of the problem of attrition and the psychological basis for 
its prediction. It is necessary to assume in studies of attrition, that the 
tendency to drop out or to complete an educational program is a charac- 
teristic normally distributed which can be no more accurately measured 
than by attrition-survival data. 

Certain characteristics of classification, on the other hand, do not lend 
themselves to acceptance of a single normally distributed variable, such 
as sex, geographical location, and occupational status. 

There are borderline cases, of course, in which consensus cannot be 
obtained. For example, whenever individuals are grouped according to 
whether they have been inoculated for some specific disease, it may be 
open to question. Some may assume that any given individual either has 
or has not been inoculated, whereas others may assume that the degree 
of effective immunization is a characteristic normally distributed which 
is not more accurately measured than by the yes-no dichotomy of inocu- 
lation. 

Whenever the foregoing assumption of a normally distributed variable 
classified into a dichotomy appears not untenable, the additional assump- 
tion must be made that the relationship between the numerical variable 
and the dichotomous variable is linear. This additional assumption is 
identical with that made whenever the product-moment coefficient of 
correlation is desired. 

The formula for biserial correlation yields reasonably accurate esti- 
mates of the product-moment correlation which would result if more 
sensitive measurements were available than those expressed in a dichot- 
omy, whenever the numbers of cases in each category of the dichotomy 
are approximately equal. As the percentage in one of the categories be- 
comes smaller, less confidence can be placed in the coefficient obtained 
from the formula. Until a formula is developed, either from theoretical 
or empirical considerations, with more general application, 
tion of biserial correlation with less than 5 per cent to 1 
either of the two categories is open to question. In spite of 
limitations in the use of biserial correlation, problems in 
and educational situations are numerous in which the tec 
applicable. 


the computa- 
0 per cent in 
the foregoing 
psychological 
hnique seems 


COMPUTATION OF BISERIAL CORRELATION 


Computation of biserial correlation offers no difficulty. The formula 
which has usually been employed is 


(pa 
Tris = (2) 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 259 


where 
Tbis = biserial r 
d = difference between the categories in means of the numerical variable 
e = standard deviation of numerical variable in total group being studied (not 


population estimate) i.e., ¢ = N 
p = proportion of cases in one of the dichotomous categories 
q = proportion of cases in the other of the dichotomous categories 


z = heights of ordinate dividing the normal curve of unit area into p and g parts 


Tables for values of (2) (see Appendix) are readily available, thus facil- 


itating solutions of the formula. The foregoing formula offers no difficulty 
in its application and is quite satisfactory whenever the relationship is low 
but is an overestimate whenever the relationship is high. Another method 
for obtaining a coefficient of correlation is to first compute a point biserial 
coefficient of correlation and adjust the obtained value by a correction 
factor for coarse correlation! which varies with the magnitude of the ob- 
tained point biserial coefficient. The formula for point biserial r is 


d ca 
ta = pa 


which is based upon the usually unacceptable assumption that the char- 
acteristic appearing in either segment of a dichotomy is uniform within 
that segment. The correction values are for the purpose of transferring 
to the assumption that the dichotomous variable is a single normal 


TABLE 107. Intelligence Quotients of High School Pupils 


m O U OU UOUN 


NUMBER INTELLIGENCE QUOTIENTS 
GRADUATED N PER CENT 2X X pe 
Yes 156 78.79 16,317 104.59615 
No 42 21.21 3,902 92.90476 
Total 198 100.00 20,219 102.11616 2,098,253 


distribution no more accurately here evaluated than by a two-point scale. 

As an application of the technique, information is available from 198 
pupils entering high school. Of this group, 156 graduated and 42 did not. 
Is this tendency to graduate from high school related to intelligence 


TA table of such correction values is shown in the Appendix. For biserial r, the 
table should be entered under the heading of two segments. Some justification for 
the use of these correction values is described in the chapter, “Other Techniques 
of Correlation Analysis.” 


260 STATISTICAL METHODS 


quotient? The needed data for solution are shown in Table 107. Both 
methods of computation will be described. 


By the traditional use of the formula, zue = E (22), an estimate of the 


relationship between the LO. and tendency to graduate, upon substitution 
yields 


11.691 [07520.2121 = 0.513 


Tots = 13.09 0.2899 


By the more conservative method here proposed, where 7, represents 
point biserial correlation, 


rp = HEA meega = 0.367 


= 13.02 


When the point biserial correlation is adjusted by the appropriate correc- 
tion factor for coarse grouping 


r = (0.367)(1.218) = 0.447 


The latter coefficient, no doubt, is a more accurate estimate of the cor- 
relation that would have been obtained in this group if a more sensitive 
measurement than a two-point scale were available for tendency to 
graduate. 

Thus tendency to graduate is related to intelligence quotient. Since 
the intelligence quotient is, on an average, higher for the graduated than 
for the nongraduated, the usual practice is to assign a positive sign to 
the correlation. After the conclusion has been made that the higher the 
intelligence quotient the greater the tendency to graduate, the sign of the 
correlation has served its purpose. 

In the foregoing example, perhaps no one would be interested in evi- 
dence indicating that the relationship is different from zero in a popula- 
tion from which the 198 students might be considered a random sample. 
If for any reason such a test of significance is desired, a t-test of the 
difference between two means may be used or, if more convenient, its 
mathematical equivalent in terms of the coefficient of biserial correlation 
when computed by the traditional formula: 


(sé ra) Far = 2) 
dl EN 


In the foregoing example sz = 0.709. When the substitutions are 
PQ 


made in the formula, a t-value of 5.54 with 196, i.e. (N — 2), degrees of 
freedom is found which indicates that the relationship is different from 
zero far beyond evidence required for the 1 per cent level. If the point 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 261 


biserial coefficient is obtained it may be substituted in the usual formula 
for t for a product-moment correlation, i.e., 
gi JE — 2) 
"NL 
which of course is mathematically identical to the previously shown ¢-for- 
mula. Thus, upon substitution, 


(3677098 — 2) _ 
t= "D 


With this latter approach, significance can be evaluated, when more 
convenient, by consulting tabled values of the product-moment correla- 
tion coefficients at the 5 per cent and 1 per cent levels of significance for 
various degrees of freedom. Such tabled values can be found in the 
Appendix. 

As was previously mentioned, one method used in item analysis in- 
volves biserial r. As an example of the application, a test of 160 items 
was administered to 400 students. For these students the mean score was 
93.60 and the standard deviation was 21.20. On item 25 of this test, 240 
students responded correctly and 160 students responded incorrectly. The 
mean score on the test for students who responded correctly to item 25 
was 97.25, whereas the mean score for those responding incorrectly was 
88.12. Substituting in the formula 


d 
Trois = E (2) 


97.25 — 88.12 HEH = 0.268 


U 21.20 0.3862 


Where many items are to be so analyzed, it may be more convenient to 
use the mathematically identical formula . 


do 
Tris = a (2) 


d = mean for students responding correctly to the items minus mean for 
all students 


È = proportion responding correctly 


. where 


Thus identical correlation is found upon substitution in the formula 


97.25 — 93.60 ( 0.6 ) + 
mi = a0 (0.8862) ~ 0268 


The coefficient here reported has been computed from the traditional 
formula for biserial r which may be converted to a point biserial correla- 


262 STATISTICAL METHODS 


tion by multiplying by a The resulting value may then be adjusted 


for coarse grouping by the usual correction factor. Thus 


0.3862 
0.268 ( —==— ) = 0.211 
(Waves) 


which yields upon consulting the table of correction factors 
r = (0.211)(1.242) = 0.262 


which is not too far different from that obtained by the traditional bise- 
rial formula. 

Perhaps the area in which biserial correlation has the widest applica- 
tion is attrition-survival studies. When applied to problems of college 
student personnel, in many cases, the investigator is confronted with a 
choice among a number of definitions of attrition-survival. A recent study 
of attrition-survival among engineering students brought the problem of 
definition of attrition-survival to the foreground. 

Attrition could be defined in terms of engineering freshmen who 
dropped by the wayside before graduation in engineering. With such a 
definition it would be necessary to assemble information and then wait 
for four or five years to count the survivals. A second possibility would 
involve taking data four or five years old and noting attrition-survival 
at the present time. This unavoidable delay suggested some other defini- 
tion. Since it is well known that first-year attrition is much larger than 
for later years, it was decided to make the definition of attrition-survival 
in terms of sophomore status. Three possible definitions were suggested, 


Definition A—Beginning sophomore year 
Attrition—Not in school, scholastic achievement ignored 
Survival—ln school, scholastic achievement ignored 
Definition B—Beginning sophomore year 
Attrition—Out-of-school with poor scholastic record 
Survival— (a) In-school with good record 
(6) In-school with poor record 
(c) Out-of-school with good record 
Definition C—Beginning sophomore year 
Attrition—Scholastic difficulty either in or out of school 
Survival—No scholastic difficulty either in or out of school 


The choice of a suitable criterion then had to be determined in terms 
of two considerations—the purpose of the study and the sharpness of the 
criterion. The first consideration is, of course, paramount. To note the 
difference in the sharpness of the criterion, biserial correlation was found 
for each definition with (1) the American Council on Education Psycho- 
logical Examination score, (2) an English placement test score, and (3) 
a mathematics aptitude score. The obtained coefficients of biserial cor- 
relation are shown in Table 108. An inspection of the data reveals so 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 263 


TABLE 108. Coefficients of Biserial Correlation with Various Criteria of 
Atirition-Survival 


ATTRITION-SURVIVAL DEFINITION 


TEST A B c 
A.C.E. 0.326 0.369 0.349 
English Placement 0.325 0.343 0.495 
Mathematics Aptitude 0.374 0.413 0.408 


much similarity among the results with the three definitions of attrition- 
survival that the choice of a criterion seems to be relatively of small 
importance in this instance. From the evidence in the table concerning 
these engineering students, it appears that the criterion could well result 
from the definition which represents the most feasible method of assem- 
bling necessary data which, in this case, is in- and out-of-school status 
without regard to scholastic attainment. 

In summary, biserial correlation represents an estimate of the product- 
moment correlation which may often be employed when it is not possible 
to evaluate one of the variables more accurately than by a dichotomous 
classification. The coefficient of correlation may be estimated by utilizing 


z 
tive estimate obtained by obtaining the point biserial correlation followed 
by a correction for coarse grouping. Regardless of the most satisfactory 
formula for obtaining a descriptive measure of relationship between two 
variables, one of which is expressed in a dichotomy, the results of the 
test of significance are identical for each method. 


THE DISCRIMINANT EQUATION 


The biserial correlation coefficient is satisfactory for determining the 
relationship between a dichotomized variable and one numerical vari- 
able. However, it is frequently desirable to predict a dichotomy from 
several numerical variables. Individuals responsible for counseling per- 
sonnel, for example, want to base judgment on more evidence than that 
furnished by any single variable. 

Just as multiple regression yields appropriate weights for utilizing 
more than one variable in predicting & numerical criterion, so also an 
equation can be used in predicting a variable dichotomy. The latter equa- 
tion may be called a discriminant equation. A coefficient of multiple 
correlation can be easily obtained, as shown in Chapter 13, from a mul- 
tiple regression equation. In a similar fashion, a multiple biserial R can 
be obtained from a discriminant equation. 

A discriminant function, originally developed by Fisher,' has received 

Re A Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” 
Annals of Eugenics, 7:179-188, 1936. 


f d 
the traditional formula for biserial 7 i.e., bis, = z (2) or the more conserva- 


264 STATISTICAL METHODS 


considerable use in ascertaining appropriate weights for a series of vari- 
ables yielding maximum separation in two groups, each of which is 
assumed to be normally distributed. Modification of the discrimination 
has been made here only to the extent necessary to agree with the as- 
sumption that the distribution in which the dichotomy exists is a single 
normally distributed variable. 

The discriminant equation may be expressed as: y = ajri + azt + 
Qta + *** F AT Where xi, Ta, £3, and so forth are numerical variables 
and a, a», as, and so forth, are the coefficients. The coefficients for the 
discriminant equation can be found by solving a series of simultaneous 
equations similar to the normal equations used in multiple regression 
analysis. These equations are as follows: 


Nedy = ajdat + atit: + Ert + --- 
Nzd: = tt + a + gäe + +--+ 
Nada = ajdryra + asZ rera + a;Dx3 + --- 


where N is the total number of cases, z is the height of the ordinate divid- 
ing the normal curve of unit area into p and q parts, and the d’s are the 
differences in the means of the numerically expressed variables in the 
dichotomy. Thus in an attrition-survival study, d is the mean in the sur- 
vival group minus the mean in the attrition-survival group for any nu- 
merical variable. The right-hand members of the equations are the same 
as those in multiple regression analysis using total deviation values, i.e., 
values from the general mean. The left-hand members of the equations 
are comparable to the xy-values in the multiple regression normal equa- 
tions. 

An example will clarify the procedure to be followed. A group of 260 
freshmen engineering students at a midwestern college, by the beginning 
of the fall quarter following the one during which they had entered, had 
been reduced in number to 176. Information was available about these 
students concerning certain variables presumably related to success. Two 
of these variables, score on the American Council on Education Psycho- 
logical Examination and high school grade-point average, are available 
each year from university routine testing. In addition, this particular 
group was given the Owens-Bennett Test of Mechanical Comprehension. 

For purposes of explanation, the data for the 260 students needed for 
analysis are: 


N=260 2X, = 28420 =X: = 669.71 2X, = 10261 
ZX? = 3,220,842 2X = 1,820.7997 2X3 = 420,303 
2X,X: = 75,007.81 EX,X, = 1,136,657 ZX:X; = 26,763.26 


where, N is the number of cases; X; is the score on the American Council 
on Education Psychological Examination; Xs is high school grade-point 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 265 


average; Xy is the score on the Owens-Bennett Test of Mechanical Com- 


prehension. 
The sums of squares and eross-products are reduced to deviation sums 
from the general mean in the usual manner and are found to be: 


Eat = 114,317.38 Za = 95.75553 2a3 = 15,348.69 
Zyra = 1,803.3554 Dayts = 15,050.769 Ext = 332.897. 


The data for the survival and attrition groups are shown in Table 
109. The z-value of 0.35905 was found in the table of normal curve. 


Tase 109. Mean Scores for the Survival and Attrition Groups 


MEAN DIFFERENCE 
SURVIVAL ATTRITION IN MEANS 
NUMERICAL VARIABLE (k = 176) (k = 84) (d) Nzd 
A.C.E. 113.4943 100.5357 12.9586 1,209.7242 
High School Average 2.7106 2.2935 0.4171 38.9375 
Owens-Bennett 40.7386 36.7976 3.9410 367.9042 
SE OO O üO O o O AA 22 


When these values are substituted in the simultaneous equations 
needed for solution of the discriminant equation they become: 


1,209.7242 = 114,317.38a1 + 1,803.355442 + 15,050.769a3 
38.9375 = 1,803.3554a1 + 95.7555302 + 332.8973a3 
367.9042 15,050.769a1 + 332.8973a2 + 15,348.69a3 
When these equations are solved simultaneously, the following values 
are obtained:1 
aj = 0.00443837 az = 0.275629 az = 0.0136394 


The discriminant equation then becomes: 
v = 0.00443837x1 + 0.27562912 + 0.0136394x3 


Where 


D a ZER 
v = = score in deviation form 
o 


x, = deviation A.C.E. score 
za = deviation high school grade-point average 


z, = deviation Owens-Bennett score 


One way of computing a multiple biserial correlation would be to use 
the discriminant equation and predict the v-score for each individual. The 


“These values were carried out many more places than needed for prediction for 


any given student in order to obtain values which would satisfy the original equa- 


tions and the calculations which follow. It is probably desirable to carry as many 


Places as can be obtained from an eight-bank or a ten-bank calculating machine. 


266 STATISTICAL METHODS 


standard deviation of these predicted scores could then be computed to- 
gether with the difference in the means, A, of predicted scores for the 
survival and attrition groups. Then substitution could be made in the 


traditional formula 
Ria = (2) 
oa\ 2 


Fortunately this labor is unnecessary. The difference between the pre- 
dicted means can be obtained from 
A = aNzd; + aNzdz + a;Nzd3 
or, in this particular case, 
A = (0.00443837) (1209.7242) + (0.275629) (38.9375) 
+ (0.0136394) (367.9042) 


= 21.11951 
It is not difficult to show, also that 
¿PU JA 
Pia 2 NN 


4 21.11951 _ 
Rois = 1.69654 Km — 0.484 


If the more conservative method of obtaining a coefficient of correlation 
is desired, the multiple-point biserial coefficient may be obtained by the 


formula 
= (Veg) Ja = EOSS 
E, (20 8 PAS LÉI 
py 


The latter formula, perhaps, is the most convenient to use since tabled 
2 2 

values of F are shown in the Appendix. In this table Si for a 0.323 and 

0.677 dichotomy is 0.5893. Thus the point biserial correlation is 


O 
-N (260) (0.5893) 
and the correction for course grouping suggests 
R = (0.371)(1.218) = 0.452 
Regardless of the preference in formulas for a suitable descriptive coef- 


ficient of correlation, each demands a test of significance which apparently 
is not open to question, the use of the formula: 


Rp = 0.371 


Pm Nm-1 = 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 267 


(21.11951)(256) — 13 gy 


Baam = (153.257 — 21.11951)3 


The F-value of 13.64 with 3 and 256 degrees of freedom is significant far 
beyond the 1 per cent level. 

Many times it is desirable to indicate the relative effectiveness of vari- 
ables used in prediction. This cannot be done by noting the relative size 
of the coefficients in the discriminant equation. Probably the most feasi- 
ble method for obtaining relative effectiveness may be noted in the con- 
tribution of the variables to the numerical values of A, which corresponds 
to the sum of squares for regression in an analysis dealing with a numeri- 
cal criterion. Thus, in the solution for A, the contributions were 


A.C.E. 5.3692 or 25 per cent 

High School Average 10.7323 or 51 per cent 

Owens-Bennett 5.0180 or 24 per cent 
21.1195 


In case any of these contributions should carry a negative sign, the sign is 
disregarded in determining relative effectiveness. 

The percentage contribution of each of these variables to effective fore- 
casting of attrition-survival in this group of students is highest for the 
high school grade-point average. It can be seen that this procedure is 
identical with that used for computing the relative effectiveness of the 
variables in a multiple regression equation. 

The relative importance, here shown, applies only when this battery of 
three variables is considered. Should a variable be removed from the bat- 
tery the relative importance of the remaining variables may change radi- 
cally. For example, the American Council on Education Psychological 
Examination and the Owens-Bennett Test of Mechanical Comprehension 
are approximately equally effective in forecasting in this group when used 
together with high school grade-point average. Their relative importance 
if the high school grade-point average is not considered cannot be inferred 
from the evidence of the three-variable discriminant equation. 

The problem of adding or deleting tests in a student personnel program 
is ever present. The time of students and faculty, together with the cost 
involved, suggests that evidence should be assembled to justify the tests 
used in a university-wide testing program. In the situation previously 
described with engineering students, high school grade-point averages and 
A.C.E. scores are assembled as a matter of administrative routine. Will 
there be a significant loss in forecasting attrition-survival if the Owens- 
Bennett Test of Mechanical Comprehension is eliminated from the bat- 
tery? 

Perhaps the most straightforward method to attack this problem is to 
compute a two-variable discriminant equation ignoring the Owens-Ben- 


268 STATISTICAL METHODS 


nett scores. The simultaneous equations needed for solving for the dis- 
criminant equation are: 
Nady = m2r + 022120 
Nad, = gp + 2223 
After the substitution of the appropriate values, 
1,209.7242 = 114,317.38a, + 1,803.3554a2 
38.9375 1,803.3554a; + 95.75553a2 
Solution of the equations yields: 
aj = 0.005928916 a = 0.2949757 
Substituting in the formula for the predicted differences in v-scores: 
A = ajNzd, + a.N zd, 
A = (0.005928916) (1,209.7242) + (0.2949757) (38.9375) = 18.65797 
This predicted difference when inserted in the traditional formula 
Rois = me, [a = 1.69654] 18:05797 = 0.455 


yields a multiple biserial coefficient of correlation of 0.455 as contrasted 
with a correlation of 0.484 when the Owens-Bennett scores were included 
in the battery. 

The more conservative estimate of correlation, and the one here recom- 
mended, can be obtained by adjusting the multiple point biserial R as 
follows: 


V(@323)(0.677)_[18.05797 
Ro = — 035905 200 7 9-840 
and \ 


R = (0.349) (1.222) = 0.426 
This latter value, as expected, is slightly lower than that obtained by the 
traditional biserial formula. 


Regardless of the choice of a descriptive coefficient chosen, the test of 
significance from a zero relationship is 


Fa,n—m—i = 


where n is the number of variables eliminated. Thus, in the foregoing 


example 
21.11951 — 18.65797) (256 
Prop = QL11951 — 18.65797) (256) _ 
12% = (153.257 — 21.11951)(1) T +77 
This F-value, with 1 and 256 degrees of freedom, is significant. Thus, sig- 
nificant loss ensues when the Owens-Bennett test is eliminated from the 
forecasting battery. 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 269 


The foregoing test of significance of the loss by elimination of one vari- 
able may be considered as a test of the significance of a partial biserial 
correlation from zero. If the value of such partial correlation is desired 
it may be obtained from the formula 


[An — Amar 

ToL23-m = 72 KEE 
SE ep Ka 
PY 


The relationship between the Owens-Bennett test scores and the attri- 
tion-survival with A.C.E. scores and high school grade-point average held 
constant then can be found by substitution as follows: 


[2111951 — 18.65797 _ 
neva TV 183.257 — 18.65797 T 0185 


If a sign is desired for the partial correlation, it is the sign occurring 
with the coefficient of the eliminated variable in the discriminant equa- 
tion which in this particular case is positive. 

The multiple biserial R suggests the degree to which it is possible to 
predict attrition-survival. It does not yield direct evidence of the kind 
which the personnel officer needs in counseling. Actually a device is needed 
by means of which the counselor can inform any given individual, judging 
from available evidence, of his probability of survival. as some specific 
probability. 

The discriminant equation lends itself to the prediction of such probabil- 


ities. It yields upon solution an z value in deviation form for any given 


individual. ; 
For the engineering students, the deviation formula of two variables is 


v = 0.005929x, + 0.295022 


To change a formula from deviation form, v = ait + agro, to raw score 
form, the following equation can be used: 
V— F= a(X SE Xy) sl an(Xe — No) 

For the foregoing problem all values have been determined except V. 
This mean is usually determined from the evidence available from the 
sample in which the discriminant equation was developed. In this exam- 
ple 176, or 67.7 per cent of the 260 individuals survived. From a normal 
probability table, this percentage yields a normal deviate of 0.4593 which 
becomes the Y. 

It is, of course, possible to obtain a Y which is independent of the 
sample. In certain situations the percentage surviving might be based 
upon experience over a period of years rather than the small sample avail- 
able. Again physical facilities may demand a maximum number that 
may be accommodated. In any case, the percentage surviving is trans- 


270 STATISTICAL METHODS 


ferred to a normal deviate which becomes the V. It should be clear that V 
is negative whenever the percentage surviving is less than 50 per cent and 
positive when greater than 50 per cent. 

For the example here studied, the V has been determined from the sam- 
ple evidence. The raw score formula for predicting survival in sigma units is 


V — 0.4593 = 0.005929(X, — 109.31) + 0.2950(X2 — 2.576) 
V = 0.0059X, + 0.2950X» — 0.9494 


This equation can be solved for any given individual. Thus for an in- 
dividual who had an A.C.E. score of 50 and a high school grade-point 
average of 1.50, the predicted V, in sigma units, would be —0.2104 
which yields, upon consulting a table of the normal curve, a probability 
of survival of 42 in 100. An individual who had an A.C.E. of 150 and 
high school grade-point average of 4.00, would have a probability of sur- 
vival of 87 in 100, In any attrition-survival study the spread in probabili- 


TABLE 110. Probability of Attrition and Survival Based on A.C.E. Scores and 
High School Grade-Point Averages 
(Chances in 100 of Survival) 


A.C.E. HIGH SCHOOL GRADE-POINT AVERAGE 
SCORE 1.50 2.00 2.50 3.00 3.50 4.00 
160 67 72 77 81 85 88 
150 65 70 75 80 83 87 
140 63 68 73 78 82 86 
130 60 66 71 76 80 84 
120 58 64 69 74 79 83 
110 56 62 67 72 77 81 
ne ES 59 65 70 75 79 
90 51 57 63 68 73 78 
80 49 55 60 66 71 76 
70 46 52 58 64 69 74 
60 44 50 56 61 67 72 
50 42 48 53 59 65 70 
40 39 45 51 57 63 68 


ties will vary depending upon the size of the multiple biserial correlation. 

For convenience, it may be desirable to construct a table of chances in 
100 of survival such as Table 110. Then, whenever the A.C.E. score and 
high school grade-point average for any given student are known, he can 
be advised of his chances in 100 of surviving to the sophomore year in 
engineering based upon the evidence from the earlier group in which the 
discriminant equation was developed. 

At many colleges counselors have available A.C.E. percentile ranks 
rather than A.C.E. scores. It is therefore more convenient to prepare the 
probability table in terms of percentile ranks rather than in terms of raw 
scores, such as Table 111. To construct this table, the percentile rank 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 271 


entries needed are transferred to raw score equivalents before being in- 
serted into the discriminant equation. It should be noted that a series of 
tables similar to Table 110 and 111 could be developed when three or 
more prediction variables are used. 

The use of the discriminant equation in attrition-survival studies is 
equally applicable to the problem of selection of personnel for any given 
job. If, on the other hand, the problem in industry consists of the classi- 


TABLE 111. Probability of Attrition and Survival Based on A.C.E. Scores and 
High School Grade-Point Averages 
(Chances in 100 of Survival) 


PERCENTILE HIGH SCHOOL GRADE-POINT AVERAGES 


RANK 1.50 2.00 2.50 3.00 3.50 4.00 
95 64 70 75 79 83 87 
85 61 67 72 77 si 85 
75 60 65 71 75 80 84 
65 58 64 69 74 79 83 
55 56 62 68 73 77 sa 
45 55 61 66 72 76 81 
35 54 60 65 70 75 80 
25 52 58 64 69 74 79 
15 50 56 62 67 73 77 

5 47 53 59 64 70 74 


fication of applicants into which of two jobs any given applicant more 
satisfactorily fits, the traditional use of the discriminant function is more 
appropriate. i q 

Whenever the problem at hand demands the prediction or the selection 
of the better individuals and the poorer individuals on a given trait ex- 
pressed in a dichotomy, the techniques of this chapter should be followed. 
In certain situations both selection and classification are needed in the 
treatment of the data. The appropriate procedures are shown in Chapter 
19. 


OTHER SERIAL CORRELATIONS 


d for biserial correlation may be extended to 
ations. In some situations the linear relation- 
variable and a variable classified 


The techniques describe 
compute other serial correl 
ship is desired between a E 
into thr r more broad groups. 

As Diere Marche Geng teachers who graduated from a wipe 
Might be rated regarding teaching success ON a scale of high, medium, an 
low. If the relationship is desired between grade-point average in college 
ded. Many psychological tests con- 
respond in one of five ways to each 
y item and the 


and teaching success, triserial r is nee 
Sist of items to which the subject can 
item. If the relationship is desired between response to an 


272 : STATISTICAL METHODS 


total score quintiserial r is appropriate. Like biserial correlations, in 
other serial correlations the assumption must be made that the segmented 
variable is normally distributed. The confidence to be placed in this 
assumption cannot be ascertained from the number of cases in the cate- 
gories of the segmented variable. 

Jaspen' reported a general formula for serial correlation which, with 
some change in notation, is as follows: 


ra Ziz: — SE 


Ce oS E = ak] 
D 
where 


Zi = height of ordinate at lower end of interval 

Zn = height of ordinate at upper end of interval 

o = standard deviation of the continuous variable 
p = proportion of total group in a category 


This formula can be used for all serial correlations regardless of the num- 
ber of categories in the segmented variable. For a dichotomy it readily 
reduces? to the traditional biserial formula. 


Tris = E (5) 


The foregoing formula for any serial correlation, like the traditional 
formula for biserial correlation, tends to yield an overestimate of the re- 
lationship, although the amount of overestimation tends to be less pro- 
nounced as the number of segments in the coarsely grouped variable 
increases. Although the Jaspen formula yields reasonably satisfactory 
estimates unless the relationship is quite high, the authors prefer the more 
conservative estimate from the formula for point serial correlation 


1 Nathan Jaspen, “Serial Correlation,” Psychometrika, 11:23-30, 1946. 


2 E = Za — 2) X 


SE = mi 
D 
The numerator may be written 
Is — 0)X, + (0 — 2X = 2(X, — Xi) = ad 
and from the denominator 


(21 — el Le —0)2 | (0—2)2 1.47 2 
sl BCS SS E SÉ SE ln E 
D D 9 Á € ka i) 
The formula then upon substitution is 


m= pa (a 
(=) 2 No 
o 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 273 
_ al — za) X 


ell SE 


which yields an underestimation of the existing relationship. When this 
formula is used, the resulting coefficient should be adjusting for coarse 
grouping from the appendix table according to the number of segments. 

To illustrate the use of this formula, a quintiserial correlation has been 
computed between item responses and total score on a farming interest 
scale. The score on the scale has been so arranged that the higher the 
score the greater the interest in farming. Of 1767 boys in rural high 
schools, responses were as follows to the item, “Farming is Fascinating 
Work.” 

N Mean Total Score 


Strongly agree 313 183.0 
Agree 398 172.5 
Uncertain 488 157.4 
Disagree 322 146.0 
Strongly Disagree 246 132.3 


The needed solution is shown in Table 112. The p’s are the proportion 
responding in each category. There are only four entries in the z-column 
since four ordinates divide a normal distribution into five segments. These 
ordinates are obtained as usual from a table of the normal curve. In the 
(21 — al column, the height of the ordinate at the top of the interval is 
subtracted from the height of the one at the bottom. Note that at the top 
of the highest interval and the bottom of the lowest interval the height 


of the ordinate is zero. When the SS entries are summed a value of 


0.91129 is obtained for direct substitution in the formula. 
The term in the numerator, =[(z: — 24) X), is obtained from Table 112 as 
follows: 


[(0.25978) (183.0) + (0.12719) (172.5) + (—0.02861) (157.4) 
+ (—0.13660) (146.0) + (—0.22171)(132.3)) — 15.6918 
The standard deviation, the calculation of which is not here shown, was 


27.45 for the total scale scores. 
The quintiserial r formula becomes 


15.6918 
T? = (97.45)V0.91129 


which yields an estimated quintiserial coefficient of r = (0.599) (1.032) 
= 0.618. This estimate is only slightly lower than that of 0.627 obtained 
from the J aspen formula. The extent to which the tendency to agree or 
disagree with this item was related to total score is indicated by the mag- 
nitude of this quintiserial coefficient of correlation. 


274 STATISTICAL METHODS 


To test whether the quintiserial r is significantly different from zero, a 
table of significant values for the product-moment correlation may be 
used, providing the table is entered with the coefficient value of the point 
serial correlation which in this case was 0.599. If desired the formula 


¿= ON — 2 


1-r 


may be used where again the r is the rp and not the adjusted coefficient. 
Thus in the example shown 


— ,/(0.599)?(1767 — 2) S 
y 1— (0.5992 T 814 
Had the Jaspen coefficient been used it would have been necessary to 


== 2 
multiply by qz JES prior to using the foregoing test of signifi- 


cance. As expected, the relationship between the responses on this item 
to total farming interest scores is significantly different from zero. 


— 2 
The expression X [85 which in the sample shown was 0.91129, 
is not constant for all quintiserial correlations. The magnitude depends 
upon the distribution of the frequencies among the five categories. 


TABLE 112, Information for Solution of Quintiserial Correlation 


MBAN 

=) torau 

RESPONSE N p 2 Zi — 2, Pp SCORE 

Strongly Agree 313 0.1771 0.25973 0.38091 183.0 
0.25973 

Agree 398 0.2253 0.12719 0.07184 172.5 
0.38692 

Uncertain 488 0.2762 —0,02861 0.00296 157.4 
0.35831 

Disagree 322 0.1822 —0.13660 0.10245 146.0 
0.22171 

Strongly Disagree 246 0.1392 —0.22171 0.35313 132.3 

Total 1767 1.0000 0.91129 


It is possible to compute a product-moment coefficient of correlation in 
the foregoing example by assigning values as follows: 


Strongly favorable to farming 
Favorable to farming 

Uncertain 

Unfavorable to farming 
Strongly unfavorable to farming 


II | 
ENO 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 275 


Although the necessary information for solution is not shown here, the 
coefficient of correlation was found to be 0.601. This coefficient when ad- 
justed for coarse becomes r = (0.601) (1.032) = 0.620. The correction 
values for coarse grouping are equally applicable to coefficients of cor- 
relation found by assigning unit differences between segments (0,1,2,3, and 
so on) by assigning sigma unit difference according to the methods of 
serial correlation. 


MULTIPLE SERIAL CORRELATION 


It is possible to extend the methods for multiple biserial correlation into 
situations involving more than two categories in the segmented variable. 
For a multiple point biserial R, the formula is 


nË 
2 2 Nä 


which is a special case of the general formula for any multiple point serial 


correlation of 
A EE SË 
"EA 
H 


which perhaps may be more conveniently written as 


In the solution of the multiple biserial correlation information for 260 
freshman engineering students furnished the example. The same group of 
students are here used for illustrating the procedure for other serial cor- 
relations. 


TABLE 113. Information for Solution of Multiple Quintiserial Correlation 


—— O MMM 


MATIT. aa Daf ms. OWENS 
MARK N p z Zi — 2n p p ACE AVE. BENNETT 
A 41 0.1577 0.24101 1.5283 0.36833 5203 123.13 1815 
B 68 0.2615 0.24101 14972 0.5725 0.08572 7711 181.26 2732 
C 80 0.3077 0.39037 ` Auen —0.1891 0.01117 8566 205.47 3035 
D 44 0.1692 0.33253 (15220 —0.8995 0.13691 4485 104.65 1722 


18033 
F 27 0.1039 —0.18033 —1.7356 0.31298 2455 55.20 957 


Total260 1.0000 0.91511 


276 STATISTICAL METHODS 


The mark in first-quarter mathematics provided a five-category seg- 
mented variable. The needed information is shown in Table 113 for solu- 
tion of the left-hand members of the following equations: 

Lay = ajdat + genre gur 
Zasy = METT + ET + geet 
Lay = ajZajrs + Ett; + 0227 


The y-value, Za —), for an “A” is 1.5283, which when multiplied by the 
p 


sum of the ACE-scores will produce a Exy for the 41 students who re- 
ceive “A” ’s. Thus the Zon will be 
(1.5283) (5203) + (0.5725)(7711) + (—0.1891)(8566) + (— 0.8995) (4485) 
+ (—1.7356) (2455) = 2451.3063 
The values of Zxzy and Eazy are found in a similar way. 
The values for the right-hand members of the foregoing equation are 


available from the solution for multiple biserial correlation. The equations, 
then, are: 


2451.3063 = 114,317.38a, + 1803.3554a, + 15,050.769a3 
64.158757 = 1803.3554a, + 95.75558a + 332.8973az 
554.1078 = 15,050.769a, + 332.8973a, + 15,348.69a3 
Upon solution the equation becomes: 


v = 0.0140793x, + 0.342763822 + 0.014861082, 
The value of A, corresponding to the sum of squares for regression, is: 


(0.0140793) (2451.3063) + (0.3427638) (63.158757) 
+ (0.01486108) (554.1078) = 64.3959 
Substituting in the formula for serial correlation 


_ _[_ 643959 _ 
R» = esos ~ 9-520 


which yields a quintiserial coefficient upon adjusting for coarse grouping of 
R = (0.520) (1.035) = 0.538 


The value thus obtained differs but little from that of 0.544 produced by 
the Jaspen formula or from that of 0.514 produced by the usual product 
moment method with numerical value assigned as follows: A = 4; Bi= 3; 
C=2;D=1;F=0. In any case, the quintiserial correlation is designed 
to indicate the degree to which mathematics achievement may be forecast 
from the American Council on Education Psychological Examination 
scores, the high school grade-point average and the Owens-Bennett Test 
of Mechanical Comprehension. 

To test whether a multiple serial correlation co 


efficient is different from 
zero, a convenient formula is 


ËS en I Y 


O ma e A A AAA ARA, ad 


EE 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 277 


a A(N —m-—1) 
Pm N=m1 E (e= = aj m 


Thus, in the foregoing example, 
EE (64.3959) (260 — 3 — 1) 
3,55 — [(260) (0.91511) — 64.3959](3) 
Frequently, a decision must be made concerning the advisability of 
omitting one or more variables from a prediction battery. In the fore- 
going example if the Owens-Bennett Test of Mechanical Comprehension 
is deleted from the battery, the serial function becomes 


v = 0.01570342; + 0.363843322 
and yields a A of 61.4736 and a multiple quintiserial correlation coefficient 
of 0.508 as contrasted to a coefficient of 0.538 when using three variables. 
To test this loss for significance, substitution may be made in the formula 
(Am = An MN — m = 1) 


PN Ss e ve. HE 
ES Ke a) ) = An] n 
p 


D _ _ (64.8959 — 61.4736) (256) 
1,258 = T260(0.91511) — 64.3959] (1) 
Although the loss is numerically small when the Owens-Bennett test is 
eliminated, the loss is significant. 

Serial correlation from discriminant analysis offers great usefulness in 
situations in which the criterion, at best, can be segmented in a few coarse 
groups. In such situations, the product-moment correlation is an under- 
estimate of the existing relationship. The magnitude of this underestima- 
tion is a function of the number of classes in the segmented variable. 
Except for the most exacting analysis, it is doubtful whether serial anal- 
ysis differs enough from the product-moment correlation analysis when 
More than five classes appear in the segmented variable to raise a ques- 
tion concerning which is the more appropriate. 


= 31.67 


= 4.31 


Exercises 
1. Of 50 farm boys who graduated from high school 20 had migrated to urban 
communities within a year. L.Q.s (X,) and scores on a farming attitude scale 
(X2) were available as follows: 
C >?°.»>MaVWwo 


MIGRATED 
YES NO TOTAL 
VARIABLE (k = 20) (k = 30) (N = 50) 

2X: 2116 3024 5140 
IX: 3101 5324 8425 

zX? = 532,636 

EX? = 1,442,333 

IXıXə = 863,541 


278 STATISTICAL METHODS 


a. Compute the biserial correlation coefficients between migration tendency 
and the 1.Q.'s and farming attitude scores. 

b. Compute the multiple biserial correlation coefficients when the migration 
tendency is predicted from the 1.Q. and farming attitude score in a dis- 
eriminant equation. 

c. Construct a table of chances in 100 of migration based upon both I.Q. 
(from 85 to 130) and attitude scores (from 120 to 210). 

2. Of 66 pupils who graduated from a high school one June, 24 matriculated in 
college the following September. In an attempt to predict tendency to enter 
college the September following the June in which graduation occurred, the high 
school principal tabulated the intelligence quotients, mathematics placement test 
scores, and reading speed scores for all pupils. 

a. Compute the biserial correlation coefficients between the tendency to 
enter college and each of the numerical variables. Is each coefficient, sig- 
nificantly different from zero? 

b. Find the discriminant equation for predicting tendency to enter college 
from known values of the three numerical variables. Compute the mul- 
tiple biserial correlation coefficient. 

c. Determine the relative importance of each of the numerical variables in 
the prediction scheme. Will the numerical variable which, in terms of the 
biserial correlation, is most highly related to a dichotomous criterion 
necessarily make the greatest contribution to the prediction of that 
criterion when included in a prediction scheme? 

d. Which numerical variable in the discriminant equation is making the 
smallest contribution to effective prediction of the tendency to enter 
college? Is it possible to eliminate this variable from the discriminant 
equation without incurring a significant loss in ability to predict? 

e. Find the second-order partial biserial correlation coefficient between the 
dichotomy and the numerical variable which is contributing the least to 
the three variable discriminant equation. 

f. Change the three variable discriminant equation from deviation score 
form to raw score form. Consider the dividing point between the two 
parts of the dichotomy to be satisfactory for determining V. 


EEN 
ENTERED COLLEGE 


DID NOT ENTER COLLEGE 


MATH. READING MATH. READING 
PUPIL LQ. SCORE SCORE PUPIL LQ. SCORE SCORE 
1 127 32 86 1 111 42 80 
2 115 18 60 2 98 12 45 
3 131 41 72 3 103 16 68 
4 110 44 50 4 114 24 56 
5 117 az 51 5 105 6 50 
6 106 20 45 6 121 30 73 
7 130 21 7 7 110 24 63 
8 119 59 91 8 103 20 62 
9 117 18 64 9 89 13 51 
10 101 22 52 10 116 28 62 
11 109 22 73 11 95 7 41 
12 116 11 74 12 109 14 47 
13 107 36 57 13 110 21 47 
14 109 14 55 14 112 24 89 


— AR 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 279 


ENTERED COLLEGE DID NOT ENTER COLLEGE 
MATH. READING MATH. READING 
PUPIL LQ. SCORE SCORE PUPIL LQ. SCORE SCORE 
15 113 21 51 15 15 15 55 
16 124 50 73 16 113 19 67 
17 132 27 78 17 96 22 49 
18 122 52 67 18 112 25 61 
19 109 29 76 19 125 42 83 
20 121 27 74 20 102 18 47 
21 142 49 94 21 119 36 77 
22 118 30 86 22 100 13 34 
23 112 13 63 23 111 22 54 
24 115 1 58 24 108 21 63 
25 107 46 47 
26 126 24 66 
27 113 21 48 
28 95 14 40 
29 107 32 64 
30 108 28 88 
31 102 11 45 
32 111 24 37 
33 113 21 61 
34 109 31 54 
35 104 16 39 
36 101 13 35 
37 118 34 54 
38 116 15 56 
39 114 30 74 
40 120 36 88 
41 105 9 38 
42 109 21 54 


3. All science students of a university were required to complete a sequence 
of two zoology courses. In one academic year, of 108 students who attempted the 
sequence for the first time, 76 successfully finished both courses. The scholastic 
aptitude scores (X,), high school grade-point averages (X2), and zoology pretest 
scores (X4) are summarized below. The high school grade-point average was 
computed on the basis of A = 1; B = 2; C = 3: and so forth. 


E 


PASSED FAILED TOTAL 

VARIABLE (k = 76) (k = 32) (N = 108) 
EX, 8,852 3,437 12,289 
2%; 170.32 89.04 259.36 
2X; 1,140 321 1,461 

EX? = 1,423,927 3X,X2 = 28,884.33 

IX? = 664.5004 ZX,X = 171,492 

2X2 = 25,207 EXX = 3,280.74 


a. Compute the biserial correlation coefficients between the tendency to 
complete the sequence in one academic year and scholastic aptitude 


280 


4. 


STATISTICAL METHODS 


scores; and high school grade-point averages; and zoology pretest scores. 

b. Find the discriminant equation by means of which tendency to complete 
the two courses can be predicted from a combination of the three 
numerical variables. Compute the multiple biserial correlation coefficient 
and test whether it is significantly different from zero. 

c. Will there be a significant loss in predicting the tendency to complete 
the sequence of courses if the scholastic aptitude scores (X,) are dropped 
from the discriminant equation? 

d. Adjust the discriminant equation containing three numerical variables so 
that sigma scores can be predicted directly from raw scores of the 
numerical variables. 

e. What would be the predicted probability of survival for a science 
student who was entering the first course of the sequence for the first 
time and who had a scholastic aptitude test score of 101, a high school 
grade-point average of 2.88, and a zoology pretest score of 8? If the 
student’s scores were 128, 1.46, and 21 respectively, what would be his 
predicted probability of survival? 

An investigator desired to ascertain the degree to which pupil mortality in 


high school could be predicted from elementary school grade-point averages 
(Xj), intelligence quotients (X»), and age at entrance to high school (X3). The 
Xg-variable was tabulated in terms of age in months in excess of 12 years. 
Complete data were available for 198 eighth-grade pupils, 156 of whom later 
graduated from high school. 


EE 


SURVIVAL ATTRITION TOTAL 

VARIABLE (k = 156) (k = 42) (N = 198) 
ZX, 391.80 83.35 475.15 
IX: 16,317 3,902 20,219 
2X3 4,310 1,554 5,864 

EX? = 1,218.4695 EX¡X2 = 49,662.21 

EX3 = 2,098,253 2X,X3 = 13,613.23 

2X3 = 189,950 ZX2X; = 585,026 


a. The biserial correlation between tendency to graduate from high school 
and intelligence quotients has been found to be 0.517. Find the biserial 
correlation values between tendency to graduate from high school and 
elementary grade-point average and age at entrance to high school. Cite 
possible reasons why the sign and size of the biserial correlation involy- 
ing age at entrance to high school should not be surprising. 


b. Determine the discriminant equation for predicting tendency to graduate 


from high school from the three numerical variables. Test the resulting 
multiple biserial correlation coefficient for significance from zero. 


c. Since elementary school grade-point averages (X,) are often difficult to 


obtain, dropping this variable from the discriminant equation may well 
enhance the utility of the scheme. Will the elimination of this variable 


seriously lessen the ability to predict tendency to graduate from high 
school? 


d. Modify the discriminant equation which includes only the X, and X, as 


prediction variables so that the probability of survival can be predicted 
directly from the prediction variable raw sources. 


SERIAL CORRELATION AND DISCRIMINANT ANALYSIS 281 


e. Construct a table of chances in 100 of survival so that whenever an 
eighth-grade pupil’s intelligence quotient and age at entrance to high 
school expressed in terms of months in excess of 12 years are known, a 
counselor can readily inform him of his chances in 100 of graduating 
from high school. 

5. Data regarding the intelligence quotient, age, and Sth-grade course mark 
average were available for 248 9th-grade pupils who could be classified into four 
groups as follows: (1) High School Attrition (did not graduate) (2) High School 
Survival (graduated but did not enter college) (3) College Attrition (dropped 
from college before end of first year) (4) College Survival (remained longer 
than one year). 


EE 


2X: 2x; 
=X AGE STH-GRADE 
GROUP N 1.Q. (CA — 144 mo) AVERAGE 
College Survival 63 6833 1642 163.61 
College Attrition 27 2728 721 61.33 
High School Survival 88 8596 2527 182.30 
High School Attrition 70 6594 2318 124.64 
248 24751 7208 531.88 
Zr = 57,943.32 Daz = 12,440.97 312 = 83.83 Zajra = —18,710.84 
Pax = 134.82 Stary = 448.98 Dry = 1195.59 Ze = —606.23 


Zen = 68.89 


a. Compute the three quadriserial correlation coefficients between the four- 
category classification and each of the prediction variables. ; 
b. Compute the multiple quadriserial correlation coefficient using the three 


prediction variables. ` ee A 
c. Can any of the prediction variables be dropped from the discriminant 


equation without significant loss? 


15 


Nonlinear Regression 


Many relationships found in educational and psychological data are 
nonlinear, in which case linear regressions underestimate the existing re- 
lationship. The procedure followed in developing suitable regression for- 
mulas is known as curve fitting. There are two steps in the procedure: 
first, deciding upon the type of curve to be utilized, and second, determin- 
ing the most appropriate constants for the type of curve chosen. 

Choosing an appropriate curve is a matter of experience although suc- 
cessive trial fittings of various curves will yield some evidence of the 
best-fitting type. However, it should be pointed out that an investigator 
may have difficulty justifying the trial of a curve which violates logical 
theory in the area under investigation. 

Occasionally an analysis of regression is undertaken in which no a priori 
knowledge suggests the nature of the regression. Many procedures are use- 
ful for detecting nonlinear relationships. Probably the three most fre- 
quently used are (1) inspecting the plotted data, (2) contrasting analysis 
of linear regression with analysis of variance of grouped data, and (3) 
contrasting analysis of linear regression with quadratic regression. 

Any one of the foregoing procedures, inconclusive though each may be, 
should not be attempted whenever existing knowledge suggests the nature 
of the relationship. For example, a test designed for the purpose of show- 
ing that the relationship between the speed of a car and the distance re- 
quired to stop is nonlinear is foolish to anyone with an elementary knowl- 
edge of physics. On the other hand, it seems foolish to assume a nonlinear 
relationship between the scores on the odd and even items of any test. 

In contrast to the foregoing obvious regressions, there are many situa- 
tions in which no generalization of linearity or nonlinearity is available 
from existing theory. As an example, it may be that the relationship 
between age of farm boys and attitude toward farming is unknown. It 
is, then, necessary to decide upon linearity or nonlinearity on the evi- 
dence resulting from appropriately assembled data. 

282 


hi 
NONLINEAR REGRESSION 283 


A recent survey of 344 farm boys included several items which could 
be evaluated in terms of attitude toward farming. The scores (Y) varied 
from zero, the least favorable, to 7, the most favorable attitude toward 
farming as shown in Table 114. A cursory inspection of the data fails to 
indicate any relationship, either linear or nonlinear. Additional evidence 
may be had by computing means. For example, the mean age for farm 
boys with the least favorable attitude toward farming is 14.7 years and 


TABLE 114. Age and Altitude of Farm Boys Toward Farming 


_———————— AA <A AAA 


ATTITUDE AGE MEAN 
score 10 11 12 13 l4 15 16 17 18 19 20 TOTAL AGE 


gom M i a o D O 


0 d a 2 ff B pw we 2 1 28 147 
1 i jf 4242 ës asas a 31 15.0 
2 4 5 3 2 3 1 8 4 8 5 1 39 151 
3 3 14 9 13 1 6 10 3 4 9 2 82 144 
4 8 4 5 7 6 8 7 0 5 6 2 68 148 
5 S 264: 2 ge véi 2 46 143 
6 2 3 1 2 1 5 4 2 2 2 1 25 14.9 
7 S 3% & 2 2 8 8 4 25 14.2 


ZY 118 122 114 133 107 122 127 86 99 109 45 1182 
N 29 33 34 35 39 37 39 25 32 30 11 344 
Y 4.07 3.70 3.35 3.80 2.74 3.30 3.26 3.44 3.09 3.63 4.09 3.44 


the mean age for the most favorable attitude is 14.2 years. Inspection of 
the intervening years suggests little promise of any relationship, either 
linear or nonlinear. 

The mean attitude for each of the age levels is shown in Table 114. 
These mean values suggest that if a relationship exists, it is probably 
nonlinear with attitude toward farming, becoming less favorable from 
the age of 10 years until the middle teens, and then becoming more favor- 
able until the age of 20 years. 

The foregoing nonlinearity is none too apparent from the mean attitude 
for each age. The relationship, particularly in small groups, may be more 
apparent when the mean differences are smoothed. One technique for this 
Purpose is that of moving averages. For example, the mean attitude for 
10, 11, and 12 years may represent age 11 years; 11, 12, and 13 years may 
Tepresent age 12 years, and so on. The mean attitude for this three-year 
moving average would be as follows: 


Age Y Age Y Age Y 
11 3.69 14 3.26 17 3.25 
12 3.62 15 3.10 18 3.38 
13 3.28 16 3.32 19 3.47 


A five-year moving average would be 


284 STATISTICAL METHODS 


Age Y Age E 
12 3.49 16 3.15 
13 3.36 17 3.33 
14 3.28 18 3.40 
15 3.29 


Both the three-year and five-year moving average suggest the existence 
of a nonliner relationship, although no test of significance supports such 
an inference. 

One method for obtaining mathematical support for the suspicion of 
nonlinearity consists of comparing the sum of squares for linear regression 
with the sum of squares for grouped data. For computational convenience 
in the analysis of regression, ten years was subtracted from each age (X1) 
to yield an X-value for predicting a Y-value, attitude toward farming. 

The values needed for determining a and C in the linear regression, 
Y = aX + C, are 


N = 344 ZY = 1,182 
ZY = 1,592 DY? = 5,258 
DX? = 10,231 ZXY = 5,374 


The normal equations are 
EXY = GE + CEX 
ZY =aEX + NC 
which become 


ll 


5,374 = 10,231a + 1,592C 
1,182 = 1,592a+ 344C 
Solving for a and C the linear regression is 
Y = —0.033592X + 3.591507 
The sum of squares for regression, as usual, is 


2 
—0.033592(5374) + 3.591507(1182) — TJ = 3.231 


'The sum of squares for total is 


2 
5,258 — 0182: = 1,196.593 


One test for nonlinearity, although not a very demanding test, consists 
of comparing this sum of squares with that for groups in an analysis of 
variance. The sum of squares for age, as usual, is, from Table 114, 

118)? 122)? e 45 d = (1,182)? EA 
E db IJ ee ue Sa = 49.040 


This test of nonlinearity is shown in Table 115. Since the resulting F-value 


is nonsignificant, little justification for nonlinearity can be found from 
this approach. 


NONLINEAR REGRESSION 285 


A more rigid test consists of comparing the sum of squares for linear 
regression with the sum of squares for quadratic regression. The quadratic 
equation is of the form 


Y=qX?+@X+C 


where the constants ai, a», and C are so taken that the sum of squares of the 
residuals from this curve is a minimum, i.e., 


3[Y — aX? — aX — CF = a minimum 


Taste 115. Analysis of Variance and Linear Regression 


ee 


SOURCE OF DEGREES OF SUM OF MEAN 

VARIATION FREEDOM SQUARES SQUARE F 
Age Groups 10 49.040 

Linear Regression 1 3.231 

Difference 9 45.809 5.09 1.48 
Within 333 1,147.553 3.45 

Total 343 1,196.593 


When this expression is differentiated partially with respect to ai, a2, and 
C and in each case the first derivative set equal to zero, the normal equa- 
tions are 
SY = qa + 2X + CZX” 
ZXY = aX? + 2X? + CIX 
ZY = a 2X? + aXX + NC 
: The sum of squares for quadratic regression is found from the expres- 
sion 
ZY) 
went + oxy + 027 Re 
The analysis is shown in Table 116. 


Tase 116. Analysis of Quadratic Regression 
Eë ee ee 


DEGREES OF 


SOURCE OF VARIATION FREEDOM SUM OF SQUARES 

2 
Quadratic Regression 2 o + EX Y + CZY — er 
Quadratic Residuals N-3 BY? — ob — ay 2XY — CZY 
Total N-—1 sy? — ZY) 


N 


286 STATISTICAL METHODS 


The number of degrees of freedom for regression is two and for resid- 
uals is (N — 3). The significance of the quadratic may be tested by com- 
puting the mean squares in the usual manner and finding the ratio of the 
quadratic regression to the quadratic residuals. 

From the information concerning age and attitude toward farming 
shown in Table 114 the additional information for solving the normal 
equations can be readily obtained as follows: 

N= 344 ZY = 1,182 2X = 1592 zs 74,624 E3XY= 5,374 
ZY” = 5,258 2X? = 10,231 XX! = 584,995 Set = 34,934 
The normal equations become 

34,934 = 584,995a, + 74,624a + 10,2310 

5,374 = 74,624a, + 10,231a, + 1,5920 

1,182 = 10,23laj + 1,5920 + 3440 
When these equations are solved for 4, a, and C, the quadratic equation is 

Y = 0.033323X? — 0.35102X + 4.06947 


The sum of squares for quadratic regression is 


2 
(0.033323)(34,934) — (0.35102)(5,374) + (4.06947)(1,182) — are 


= 26.431 
The advantage of the quadratic over the linear regression is shown in 
Table 117. The significant advantage yields satisfactory proof of non- 


TABLE 117. Advantage of Quadratic over Linear Regression 


SOURCE OF DEGREES OF SUM OF MBAN 

VARIATION FREEDOM SQUARES SQUARE F 
Quadratie Regression 2 26.431 13.215 3.85 
Linear Regression 1 3.231 3.231 

Advantage i 1 23.200 23.200 6.76 
Quadratic Residuals 341 1,170.162 3.432 

Total 343 1,196.593 


linearity, but does not necessarily imply that the quadratic equation is 
the most desirable curve which might be chosen. 

In any case, from information in Table 117, the correlation obtained 
from this quadratic curve is low, as seen from the coefficient of correlation, 
R,, or index of correlation as it is more commonly called. 


— JSS. Quadratic Regression që 26.431 
e d S.S. Total ~ NI196.593 T 0-149 


NONLINEAR REGRESSION 287 
For convenience the regression equation was developed by substituting 
a code age of 10 years less than the actual age. The quadratic equation 
Y = 0.033323X7 — 0.35102X1 + 4.06947 
is converted to original Xj-ages by 
Y = 0.033323(X, — 10)? — 0.85102(X1 — 10) + 4.06947 
which simplifies to 
7 = 0.033323X7 — 1.01748X, + 10.9130 


By substituting successive age levels in the quadratic regression equation 
for Xj, predicted mean attitude toward farming can be shown for different 
age levels as follows: 


PREDICTED 
YEARS OF AGE MEAN ATTITUDE SCORE 

10 4.07 

11 3.75 

12 3.50 

13 3.32 

14 3.20 

15 3.15 

16 3.16 

17 3.24 

18 3.39 

19 3.61 

20 3.89 


The age at which the attitude toward farming was least favorable was 
15.27 years, or 15 years and 3 months. Since the ages used were reported 
to the last birthday, probably a more accurate statement of the least 
favorable attitude age is 15 years and 9 months. 

Regression equations can be used for interpolation or for extrapolation. 
Interpolation for ages 10 to 20 years in the quadratic equation just shown 
appears satisfactory, but extrapolation for ages of less than 10 years or 
of more than 20 years yields attitude scores not justified by the available 
data. The gencralization that the quadratic equation should not be used 
for extrapolation can be ignored only when knowledge beyond the avail- 
able data in the area studied is sufficient to warrant such a generalizaton. 

Many times the quadratic regression may well be replaced by some 
other curve more in line with known theory. Per pupil cost and size of 
school is one example in which such replacement is appropriate. In the 
extremely small school systems the per pupil cost is high. As the school 
increases in size the per pupil cost decreases rapidly at first, and then 
more and more slowly. For such regressions a curve is needed that be- 
comes asymptotic to a certain minimum. Certain log curves as well as the 


curve produced by Y = 5 + C will satisfy such conditions. 


A sample of small school districts maintaining high schools in lowa was 


288 STATISTICAL METHODS 


drawn, all of which had an average daily attendance (ADA) for both 
elementary and high school of less than 900. Ten schools were drawn at 
random from those with an ADA of less than 100, ten from those with an 
ADA of 100 but less than 200, and so on. In this manner, a sample of 90 
schools was obtained. The mean per pupil costs are shown in Table 118. 


TABLE 118. Mean Per Pupil Cost and ADA 


SSS 


ADA PER PUPIL COST 
LEI N Y, 
Less than 100 10 274.60 
100-199 10 270.00 
200-299 10 223.90 
300-399 10 192.90 
400-499 10 193.50 
500-599 10 205.40 
600-699 10 206.30 
700-799 10 200.60 
800-899 10 190.70 


Total 90 217.54 
QA 225 
An analysis of variance—single classification was made and is shown 
in Table 119. The sum of the squares of 90 per pupil costs was 4,438,629. 
This sample of 90 small schools illustrates the well-known inference that 
per pupil cost is a function of the ADA. 


TABLE 119. Analysis of Variance—Single Classification of Per Pupil Cost 


and ADA 
SOURCE oF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE F 
ADA Groups 8 85,144.62 10,643.08 9 15 
Within 


81 94,181.70 1,162.74 j 
Total 89 179,326.32 
TABLE 120. Test of Significance of Ni onlinearity of Per Pupil Cost and ADA 


SOURCE op DEGREES OF SUM OF 


MEAN 
VARIATION FREEDOM SQUARES SQUARE F 
ADA Groups 8 85,144.62 
Linear Regression 1 52,204.57 
Difference T 32,940.05 4,705.72 0 
1 


iffer 4.05 
Within Groups 8 94,181.70 1,162.74 
Total 89 179,326.32 i 


NONLINEAR REGRESSION 289 


A linear regression was next utilized with the 90 schools. The needed 
values are 


N = 90 ZY = 19,579 
DX = 40,407 DY? = 4,438,629 
EX? = 24,141,835 ZXY = 8,230,636 


When substitution is made, the normal equations are 
8,230,636 = 24,141,835a + 40,407C 
19,579 = 40,407a + 90C 
The sum of squares for regression, then, is 52,204.57. This is entered in 
Table 120 together with needed entries in the analysis of variance shown 
in Table 119 as a test of significance of nonlinear relationship. The sig- 


nificant F-value suggests some nonlinear regression be utilized. A quad- 
ratic regression was then found. The additional information needed for 


the normal equations is 
EX?Y = 4,812,258,072 
EX? = 16,360,813,687 
XX! = 11,838,001,267, 144 


The normal equations then are 
4,812,258,072 = 11,838,001,267,144a, + 16,360,843,687a2 + 24,141,835C 
8,230,636 16,360,843,687a1 + 24,141,835a2 + 40,407C 
19,579 = 24,141,835a, + 40,407a2 + 90C 


Upon solution the quadratic regression is 
Y = 0.000268778X? — 0.3406196271X + 298.3735 


which yields a sum of squares for regression of 72,464.99. This sum of 
squares yields an index of correlation of 0.6357 as contrasted to the coef- 
ficient of correlation of 0.5395 from the linear regression, a significantly 
better prediction for the former as indicated by an F-value of 16.49. 

Predicted per pupil costs from quadratic regression are shown below 
for various ADA’s. It can be seen that the lowest per pupil cost, as judged 
from this evidence, is found with an ADA of 634. That per pupil costs 
vary as indicated by quadratic regression, particularly if extrapolation to 
larger ADA’s is wanted, is out-of-line with common knowledge in school 
administration, For the foregoing reasons the use of a curve has been 
chosen resulting from an equation of the form 


where X” is equal to the average daily attendance. Data from the school 
Year 1949-1950 for 836 school districts are used to illustrate the procedure. 


290 STATISTICAL METHODS 


PREDICTED PER PUPIL COST 


ADA (Quadratic Regression) 
50 282.01 
100 267.00 
150 253.33 
200 241.00 
250 230.02 
300 220.38 
350 212.08 
400 205.13 
450 199.52 
500 195.26 
550 192.34 
600 190.76 
650 190.53 
700 191.64 
750 194.40 
800 197.90 
850 203.04 
900 209.53 


For convenience 1000 was divided by the ADA in each school, thus 
making the equation read 


Y=ax+C 
where 
X= 1,000 
ADA 


The foregoing equation is now a linear function and may be solved as 
usual. The needed values are as follows: 


N = 836 ZY = 194,068 
2X = 4,578.80 ZY” = 46,910,084 
ZX? = 35,461.0264 ZXY = 1,147,085.16 


The normal equations are 
1,147,085.16 = 35,461.026a + 4,578.80C 
194,068 4,578.80a + 8360 
which yield the equation 


Y = 8.10652X + 187.74 


The analysis of this regression is shown in Table 121. The correlation be- 
tween the reciprocals of ADA and per pupil cost is 0.6058. This curve 
chosen is asymptotic to a minimum of $187.74. 

Actually then the usable equation is 


ya EN + $187.74 


Since this equation is based upon 1949-1950 data and school costs have 
been so rapidly changing perhaps a more satisfactory way to state rela- 


NONLINEAR REGRESSION 291 
Tase 121. Analysis of Regression of Reciprocal ADA 


SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Regression 1 682,491 682,491 
Residuals 834 1,176,889 1,411 
Total 835 1,859,380 

r = 0.6058 F = 483.69 


tionship would be in terms of one dollar. Such an equation may be had 
by dividing the two terms on the right-hand by $187.74, making 


> _ $43.18 
Y = ña + $1.00 


where 
Y = amount the per pupil cost for every dollar per pupil cost in large cities 


Nonlinear relationship requires, in addition to mathematical treatment, 
a high degree of familiarity with the area under study in order that any 
curve chosen may be in line with logical considerations. 


Exercises 


1. Estimates of the average number of hours spent in study each week of the 
fall term were made by 70 university freshmen. In the following table are 
tabulated the study hour estimates and the fall term grade-point average for 
each student. 

a. Assuming a linear relationship, develop an equation for predicting fall 
term grade-point averages to be expected from known number of hours 
weekly reported spent in study by freshmen. Is this a logical assumption 
to make? 

b. Test the hypothesis that, when a linear relationship is assumed, the fall 
grade-point averages cannot be predicted from the estimated number of 
hours weekly reported spent in study by freshmen. 

c. Assuming a quadratic relationship, find an equation for predicting fall 
grade-point averages to be expected from known number of hours 
weekly reported spent in study by freshmen. Is the assumption of a 
quadratic relationship more logical than the assumption of a linear 
relationship? Why? 

d. Test the hypothesis that, when a quadratic relationship is assumed, the 
fall grade-point averages cannot be predicted from the estimated num- 
ber of hours weekly reported spent in study by freshmen. 

e. Determine whether significantly better prediction is attained by using 
the quadratic equation rather than the linear equation. 

f. Find the product-moment coefficient of correlation and index of correla- 
tion between the grade-point averages and the estimated number of 
hours spent in study per week. Interpret your conclusion in Exercise le 
on the basis of these values. 


292 STATISTICAL METHODS 


HOURS OF HOURS OF 
GRADE-POINT STUDY GRADE-POINT STUDY 
STUDENT AVERAGES PER WEEK STUDENT AVERAGES PER WEEK 

1 1.99 10 36 1.62 10 

2 2.33 18 37 2.96 25 

3 2.73 14 38 2.14 11 

4 3.59 16 39 3.43 9 

5 2.88 16 40 2.78 21 

6 1.42 8 41 3.62 15 

7 1.93 19 42 2.76 13 

8 1.76 9 43 2.27 8 

9 2.75 13 44 2.81 12 
10 3.13 12 45 1.39 18 
11 2.59 18 46 0.93 6 
12 1.72 20 47 2.57 18 
13 1.16 24 48 0.97 10 
14 2.43 10 49 2.89 16 
15 2.88 23 50 2.40 19 
16 1.50 15 51 3.03 18 
17 0.78 7 52 1.25 TA 
18 3.31 10 53 3.23 11 
19 3.53 17 54 2.59 17 
20 2.50 10 55 1.20 19 
21 1.41 19 56 2.64 11 
22 1.60 9 57 1.16 8 
23 2.46 13 58 3.37 11 
24 0.83 18 59 . 2.36 10 
25 2.70 14 60 3.61 16 
26 2.58 12 61 1.36 T 
27 1.57 9 62 2.53 14 
28 3.25 13 63 1.79 6 
29 2.19 9 64 2.84 11 
30 1.83 8 65 1.89 17 
31 1.68 10 66 2.83 16 
32 2.60 21 67 1.48 11 
33 3.41 19 68 3.84 17 
34 3.75 15 69 2.69 15 
35 2.47 12 70 2.17 12 


2. A group of 20 high school pupils in a beginning typing class was allowed 
a limited amount of time in which to type a lengthy exercise. Since none of 
the pupils was able to complete the exercise in the time allowed, the results were 
scored on the basis of the number of words typed, when corrected for errors. 
The exercise scores (Y) and the words per minute of typing speed (X ) attained 
by each pupil prior to the test are summarized below. 


N= 2 Sin 25,675 ZY? = 536,413 
ZY = 2,535 ZX: = 1,104,875 ZXY = 105,755 
=X = 665 ZX‘ = 51,385,375 ZX'Y = 4,820,325 


a. Find the Tegression equation, assuming linearity, by means of which the 
typing exercise scores can be predicted from known typing speed. Com- 
pute and interpret the analysis of linear regression, 


NONLINEAR REGRESSION 293 


b. Determine the quadratic regression relating the typing exercise scores 
and the typing speed. Is this a more reasonable relationship to assume 
than the linear relationship? Why? 

c. Can a significantly better prediction be attained by using the quadratic 
equation rather than the linear equation? Is the result of the test of 
significance surprising? Would you anticipate the same results if the 
typing exercise were longer? Compute the index of correlation. 

d. Plot on graph paper predicted values of Y from known values of X, 
using both the linear and quadratic equations. Measure the Y-values on 
the ordinate and the X-values on the abscissa. On the basis of the two 
lines, which equation would you prefer to use for prediction purposes? 
If you were forced to extrapolate, which equation would you use? Are 
the constants in either equation totally logical? 

3. Cite the similarities and differences existing between the test of significance 
of the loss due to the elimination of a variable of a linear multiple regression 
scheme described in Chapter 13, and the test of significance of the advantage 
of a higher order parabola, such as a quadratic, over a linear regression equa- 
tion. For a given set of data, will the sum of squares for linear regression always 
be smaller than the sum of squares for quadratic regression? Why? 

4. On the basis of logical considerations only, what would probably be the 
nature of the appropriate curve representing the relationship between the fol- 
lowing pairs of variables? Sketch the regression line as it might appear. Suggest 
an equation to represent each relationship. , . 

a. Size of community and number of superintendents hired during the last 
ten years. 

b. Age of adult male and finger dexterity score. 

c. Reading comprehension score and reading speed score of law college 


students. d 
d. Number of pupils transported to school by bus and cost per pupil 
transported. A 
e. Intelligence quotients of tenth-grade boys and number of false beliefs 

held. 


f. Final marks received by junior high school general science pupils and 
pupil ratings of the general science teacher. 

5. An investigator has found that the number of hours per week spent by 
undergraduate students reading selected references in educational psychology was 
quadratically related to their achievement in an educational psychology course. 
He then wanted to predict achievement in the course (Y) by means of a three 
prediction variable regression equation including hours per week spent in read- 
ing (X,), a measure of scholastic aptitude (Xə), and final marks in general 
Psychology (X, 3). Linear relationships existed between Xo and ¥ and between 
Za and Y. What assumptions, if any, must be made concerning the interregres- 
sions between X, and X, and between X, and Xy? In view of your answer to 
the foregoing question, what would be the general form of the multiple regres- 
sion equation? Write the normal equations which would be necessary to deter- 
mine the coefficients of the regression equation. 


16 


Other Techniques of 


Correlation Analysis 


In the chapter dealing with the coefficient of correlation, the computa- 
tion and interpretation of this measure of relationship were discussed 
from a descriptive standpoint only. The standard error of the coefficient 
of correlation was described in the chapter concerned with the classical 
theory of sampling. Techniques for testing the significance from zero of 
multiple, zero-order, partial, and serial correlation coefficients were pre- 
sented in the regression and serial correlation chapters. Research situa- 
tions in education and psychology frequently require other techniques of 
correlation analysis than those previously discussed. Several techniques 
concerning product-moment correlation analysis will be presented in the 
first portion of this chapter, followed by a discussion of the phi coefficient, 
the tetrachoric correlation coefficient, and coefficients in other coarsely 
classified two-way distributions. 


PRODUCT-MOMENT CORRELATION ANALYSIS 


Frequently situations are encountered in which two coefficients of cor- 
relation have been computed between measures of the same variables, but 
these coefficients have been computed from independent samples. The in- 
vestigator may wish to know whether there is a greater difference between 
the two coefficients of correlation than might result through random sam- 
pling from a single population. Such a test of significance cannot be ac- 
complished by direct comparison between two coefficients because the 
sampling distribution of coefficients of correlation is not normally distrib- 
uted. Probably the most satisfactory method so far proposed, although 
only an approximation, has been suggested by Fisher.) He has shown that 
normality of a sampling distribution with linear relationships can be 

*R. A. Fisher, Statistical Methods for Research Workers, 2nd ed. (Edinburgh, 
Oliver and Boyd, Ltd., 1925), pp. 163-171. 

294 


OTHER TECHNIQUES OF CORRELATION ANALYSIS 295 


accomplished by transforming coefficients of correlation to Z-values by 
the formula 


Z= zog =| 
The Z in the notation for this transformation should not be confused with 
the z used to refer to the height of the ordinate in the normal curve of unit 
area. Some authors have used Z and others Z’ to indicate this function. 
The transformation of r’s to Z’s is appropriate in analyses involving sta- 
tistical inference, either estimation or hypothesis testing. Before illustrat- 


Frequency 


——I 
-1.0 0.8 -0.6-04-02 0 0.2 04 06 08 1.0 
Value of r 
Tra. 26, Sampling Distributions from Populations in which r = 0.00 and r = 0.80. 


Reprinted from R. A. Fisher, Statistical Methods for Research Workers (Edin- 
burgh, Oliver and Boyd, Ltd., 1925), by permission of the author and publisher. 


ing the use of the Z-transformation some discussion of the reasons for its 
use may be in order. 4 , 

Whenever the coefficient of correlation in a population is some value 
Other than zero, the distribution of ze, computed from samples drawn 
from this population, is not normal. The farther away the population 
Coefficient of correlation is from zero, in either positive or negative direc- 
tion, the more skewed the sampling distribution becomes. In Figure 26 are 
shown the sampling distributions obtained from a population in which the 
true coefficient of correlation is 0.00, and the sampling distribution ob- 
tained from a population in which it is 0.80. It will be noted from inspec- 
tion of this figure that the sampling distribution from the population in 
which the coefficient of correlation is 0.00 is symmetrical and approxi- 


296 STATISTICAL METHODS 


mately normal, vyhereas the sampling distribution from the population in 
which r = 0.80 is skewed in the negative direction. The closer the popu- 
lation coefficient of correlation is to 1.00 the more the skewness of the 


increases, the shape of the distribution, for all practical purposes, remains 
constant. The r-values, shown on the abscissa, however, become nearer 


1 l+r 
ce 3 loe.[ 5 — 2 
for transforming the sampling distributions of 7s to normality, can be 
obtained by direct solution or by consulting the Appendix table of Z- 
values corresponding to various r-values. It can be seen from this table 
that the closer r is to 0.00 the smaller will be the change when the r is 
transformed to Z. As r becomes larger, however, the greater will be the 
difference between r and Z. For example, when r is larger than 0.760, the 
corresponding Z-value is greater than unity. 

The standard error of Z can be obtained from the formula 


EE 


and the standard error of the difference between two Z's from 


"R 
Dr Fee Që o gy qir tt TE 
Thus, the difference between two r’s can be evaluated for significance by 
transforming the r’s to Z-values and expressing the difference as a relative 
deviate of the normal curve. 
For example, an investigator determined that there was a correlation of 
0.635 between high school grade-point average at graduation of 197 pupils 


OTHER TECHNIQUES OF CORRELATION ANALYSIS 297 


rı = 0.635, Z, = 0.7498 ra = 0.480, Zz = 0.5230 
N, = 197 N: = 243 
e. DEE 
o 1 1 1 1 ` 
Vestas Tok + 240 


By consulting a table of the normal curve it can be seen that the probabil- 
ity of obtaining a difference in either direction as large or larger than this 
one, as a result of random sampling from a single population, is 0.0188. 
The null hypothesis is rejected since the probability accompanying the 
difference is smaller than the 5 per cent level. It should be noted that the 
foregoing test of significance is based upon independent samples. A simi- 
lar test for coefficients of correlation computed from the same sample will 
be discussed in a later portion of this chapter. 

Since the standard error of Z is constant throughout the range of Z- 
values, the upper and lower fiducial limits for the population r can be 
computed with the use of the Z-transformation for an obtained r of any 
magnitude. An investigator obtained a coefficient of correlation of 0.80 
between size of vocabulary and test intelligence based upon 40 cases. 
What are the upper and lower fiducial limits of this obtained coefficient 
of correlation of 0.80? The Z-value corresponding to an r of 0.80 is 1.0986. 
The standard error of a Z-value based upon 40 cases is 


1 1 
—= = — = 0.164 
VN-3 V37 


and the normal deviate corresponding to the 5 per cent level is 1.96. Thus, 
in terms of the upper fiducial limit 


Z = 1.0986 + (1.96) (0.164) = 1.4200, which corresponds to an r of 0.890 
and of the lower fiducial limit 
Z = 1.0986 — (1.96) (0.164) = 0.7772, which corresponds to an r of 0.651. 


It should be noted that after the upper and lower fiducial limits of the 
Z-value have been obtained they are reconverted to r-values for interpre- 
tation. Evidence of the degree of skewness in the sampling distribution of 
an r of this size is reflected in the difference of 0.090 between the sample 
r and the upper fiducial limit, and the corresponding difference of 0.149 
between the sample r and the lower fiducial limit. 

Fisher’s Z-distribution may also be used for the purpose of establishing 
the probability that any obtained r differs from a postulated r of any 
given size. For example, an r of 0.50 was obtained in a sample of 25 cases. 
What is the probability of drawing such an r from a population in which 
7 = 0.90? The Z-values for these r’s are 0.5493 and 1.4722, respectively, 


298 STATISTICAL METHODS 


z _ _ 1.4722 — 0.5493 13689 
i 1 
25 —3 

The probability of a normal deviate as large or larger than 1.3689 when 
evaluated by a normal curve table, is 0.171. Thus the chances are 17 in 
100 that a Z will differ, as much, or more than 0.9229, (1.4722 — 0.5493), 
from a population r of 0.90. If the probability of the obtained correlation 
resulting from random sampling fluctuations from a population r of 0.90 
or larger is desired, the probability is one-half of the value just computed, 
or 0.0855. 

The test of the significance of a difference between two obtained coeffi- 
cients of correlation described in a preceding section of this chapter will 
suffice for comparing two 7's, but this procedure is inadequate when more 
than two r's are being contrasted simultaneously. The procedure for de- 
termining whether two or more coefficients of correlation might have been 
drawn from a common population involves the computation of chi square. 
In the following worksheet are shown the test-retest reliability coefficients 
of an examination dealing with number concepts and abilities obtained 
from five first-grade classes in separate schools. 


CLASS N N-3 r Z Z(N — 3) ZYN — 3) 
1 21 18 0.930 1.6584 29.851 49.505 
2 31 28 0.965 2.0140 56.392 113.573 
3 19 16 0.927 1.6366 26.186 43.855 
4 16 13 0.920 1.5890 20.657 32.824 
5 13 10 0.886 1.4030 14.030 19.684 

Total 100 85 147.116 259.441 


To determine the probability that these coefficients of correlation might 
have been drawn from a common population, the totals in the worksheet 
are substituted in the following formula: 


gege 2 2 
X = SI — 3)] — ear = 259.441 — Cazo = 4.816 
The resulting chi square is evaluated with degrees of freedom equal to the 
number of coefficients of correlation minus one, or in this case four degrees 
of freedom. It can be seen from a table of chi square that a value of 4.816 
is much smaller than that of 9.488 required for significance at the 5 per 
cent level. 

Since the hypothesis of a common parent population was not rejected, 
the five correlations can logically be averaged to obtain evidence on which 
to base the standard error of the sampling distribution of this population. 
In averaging coefficients of correlation by the Z-transformation, the prod- 


OTHER TECHNIQUES OF CORRELATION ANALYSIS 299. 


ucts (Z) (N — 3) are summed for all samples and the result divided by 
2 (N — 3). The Z in the foregoing example is therefore 1.7308 which 
` corresponds to an average r of 0.940. Fisher has proposed that the stand- 
ard error of an average Z can be obtained from the formula 


1 
Z= VEN — 3) 

The upper and lower fiducial limits of the coefficient of correlation of 
0.940 obtained by averaging the five r’s in the example are: 1.7308 + 
(1.96) (0.107) = 1.9405 and 1.5211, which correspond to ze 0.960 and 
0.909. The procedure of averaging coefficients of correlation described in 
the foregoing section may be employed without making the chi-square 
test, provided an average correlation is desired which has meaning with- 
out regard to fiducial limits. It is probably unsound, however, to combine 
"e for computing fiducial limits without evidence that a single population 
is involved. 

All the foregoing tests of significance in correlation analyses are based 
upon the assumption of independent samples. Occasionally, a test of sig- 
nificance of the difference between two coefficients of correlation based 
upon the same sample is desired. For example, an investigator found a 
correlation of 0.34 between infant intelligence ratings (1) and socio-eco- 
nomic ratings of foster homes (Y). The socio-economic ratings of the 
foster homes were obtained from five to ten years after the infant ratings 
had been made, At the time the socio-economic rating of the home was 
obtained, each child was administered an intelligence test (2) which was 
found to correlate 0.44 with socio-economic ratings. The correlation be- 
tween the infant ratings and the retest intelligence test scores was 0.39 
for the 80 subjects in the investigation. It is desired to determine the 
significance of the difference between 7,1 and 7,2. The significance of the 
difference between two such coefficients of correlation based on the same 
sample is obtained by evaluating F in the following formula which was 
developed by Hotelling.* 


Da Sen na) (N — 3)(1 + mal 


ua = 99 — TË — rh — 1h + Prima) 


Upon substitution into the formula 


Ble (0.34 — 0.44)*(80 — 3)(1 + 0.39) _ 107 
18 Of — (0.39) — (0.34)? — (0.44)? + 2(0.39)(0.34)(0.44)] 131 
= 0.82 


Since the obtained F of 0.82 is far less than that required for significance, 
the null hypothesis is not rejected. 


“Harold Hotelling, “The Selection of Variates for Use in Prediction with Some 
Comments on the General Problem of Nuisance Parameters,” Annals of Mathe- 
matical Statistics, Vol, XI, 1940, pp. 271-283. 


300 STATISTICAL METHODS 


'The techniques of product-moment correlation analysis described in 
this section are useful for situations in which all variables concerned are 
numerically expressed continuous variables. Research situations requiring © 
measures of relationship between variables which are not expressed nu- 
merically require other procedures for analysis. 


PHI COEFFICIENT 


It has been pointed out that biserial correlation is an approximate esti- 
mation of product-moment correlation whenever one variable is numeri- 
cally expressed and the other variable is available in a dichotomy only. 
In certain other situations both variables are expressed in dichotomies, 
yielding a four-cell table from which an estimate of the coefficient of cor- 
relation is desired. 

The familiar chi-square technique provides a suitable test of signifi- 
cance concerning the probability of rejecting the hypothesis that no rela- 
tionship exists between two dichotomously expressed variables. In many 
situations in education and psychology, existing knowledge makes such a 
test entirely perfunctory. The research worker desires to know the degree 
of relationship existing in the group studied. For purposes of reporting an 
estimation of a coefficient of correlation in a four-cell table, the phi coeffi- 
cient and tetrachoric correlation coefficient are employed. > 

The phi coefficient (g) is obtained by the formula 


Se ad — be 
V (a +b)(c + d)(a + c)(6 + d) 


where a, b, c, and d are the number of cases in each of the cells of a four- 
fold table as follows: 


The numerator of the formula is the difference in the products of the diag- 
onal cells and the denominator is the square root of the product of row 
and column subtotals. 


TABLE 122. Responses to a Certain Issue and Church Attendance 


ATTENDED ITEM RESPONSES 
CHURCH YES NO TOTAL 
Yes a 53 125 
a (b) 
No 53 72 125 
(e) (a) 
Total 125 125 250 


OTHER TECHNIQUES OF CORRELATION ANALYSIS 301 


For an illustration of the computation of the phi coefficient, the infor- 

mation in Table 122 will be used. A sample of 250 adults was asked to 

_ react as satisfied or not satisfied with present policy on a certain issue. 
Disregarding church attendance, the number satisfied was equal to the 
number not satisfied. The data shown, of course, are hypothetical. Seldom 
would any study yield an exact fifty-fifty response. 

From the foregoing formula for computing the phi coefficient, 

(72)(72) — (53)(53) 

d W195) (125 ADE 

(125) (125) (125) (125) 

An inspection of Table 122 indicates that there was a tendency for those 

who had attended church to react favorably toward the item and for 

those who had not attended church to react unfavorably toward the item. 

For convenience, then, a positive sign can be attached to this coefficient, 

if the desired interpretation is that the tendency to attend church is re- 
lated to the tendency to react favorably to this item. 

The phi coefficient solution does not demand that either or both of the 
characteristics be variables. On the other hand, if this coefficient is to be 
interpreted as a coefficient of correlation, three assumptions must be satis- 
fied. The assumptions must not be unreasonable that (1) the tendency to 


0.152 


TABLE 123. Response to an Aptitude Test Item and Later Probationary Status 
ET 
CLEAR OF ITEM RESPONSE 


PROBATION CORRECT INCORRECT TOTAL 
Yes 361 59 420 
No 39 41 80 
Total 400 100 500 


attend church is a characteristic which is normally distributed although 
not more accurately evaluated than by attendance or non-attendance on 
a certain Sunday; (2) favorableness toward present policy on the issue 
studied is a characteristic which is normally distributed although not 
More accurately evaluated than by a Yes-No response; and (3) the rela- 
tionship between the tendency to attend church and favorableness toward 
present policy is linear. "e 

An example where neither variable is a 50-50 split, which is the usual 
situation, is shown in Table 123. The numbers answering an item correctly 
and incorrectly on an aptitude test are shown for students who were and 
who were not on probation. The information in this table yields a $ of 
0.3409 which may be interpreted as a coefficient of correlation, indicating 
the degree of relationship existing between the tendency to avoid probation 
and student aptitude as revealed by this one item. With the foregoing 
statment this phi coefficient carries a positive sign. It should be pointed 


302 STATISTICAL METHODS 


out that if the interpretation desired is between aptitude and tendency 
to be placed on probation, this phi coefficient might well carry a negative 
sign. 
“The mathematical relationship between the phi coefficient and chi 
square is shown by the equation: 
—A pe 
*= AN 
Thus, if chi square has been first computed, the corresponding phi coeffi- 
cient may be most easily found by extracting the square root of the quo- 
tient of chi square divided by the number of cases. If the phi coefficient 
has been first computed, its square multiplied by the number of cases 
produces the chi-square value. 

As previously indicated, the phi coefficient may be interpreted as a co- 
efficient of correlation whenever the assumption is not unreasonable that 
both dichotomies are actually variables which are normally distributed 
and linearly related, if indeed any relationship exists. Whenever the phi 
coefficient is interpreted as a coefficient of correlation, it should be recog- 
nized that it is an underestimate of the correlation which would ensue, if 
numerical values of each distribution were available. It has been suggested 
that a more satisfactory estimate of relationship ensues if the phi coefh- 
cient is divided by 0.637. Empirical evidence tends to confirm the useful- 
ness of this adjustment whenever the correlation is low and to increasingly 
deny the usefulness as the magnitude of the relationship increases. 

It is here proposed that the phi coefficient be adjusted by a changing 
divisor as the size of the phi coefficient increases. A table for converting 
phi coefficients into estimates of the corresponding coefficients of correla- 
tion is shown in the Appendix. This table was prepared from the formula 


r = sine (¢ 90°) 


for various values of the phi coefficient. Some justification for the use of 
this formula will be shown in the discussion dealing with tetrachoric cor- 
relation. 

By the use of this conversion table, the coefficient of correlation 0.2365, 
corresponding to the phi coefficient of 0.152, indicates the relationship be- 
tween the tendency to attend church and favorableness toward the issue 
reported. In this example, because the relationship is low, the conversion 
table value of 0.2365 differs but little from that obtained by the tra- 
ditional procedure of dividing the phi coefficient by the constant, 0.637, 
producing a value of 0.240. The difference between the two procedures 
becomes more pronounced in the four-fold table: 


OTHER TECHNIQUES OF CORRELATION ANALYSIS 303 


which yields a phi coefficient of 0.80. The table in the Appendix suggests 
a correlation of 0.951 whereas division by 0.637 yields 1.189. 


TETRACHORIC CORRELATION 


Tetrachoric correlation may be used to report the relationship between 
two variables, each of which is expressed in a dichotomy. The tetrachoric 
correlation coefficient was first proposed by Karl Pearson! in 1905. He de- 
veloped a formula for obtaining this coefficient in which the right-hand 
member was an infinite series as follows: 


C rpe gp MED Gj — aus — 8) + ete. 
where ad — be is the difference in the products of the frequencies in the 
diagonal cells; the z's are the heights of the ordinates separating the seg- 
ments of the dichotomies; and h and k are the normal deviates corre- 
sponding to the z's. 

Because the solution of the foregoing is so laborious, it has been the 
practice in education and psychology to disregard all but the first two 
terms in the right-hand member. This abbreviated formula may be solved 
by the usual method for quadratic equations. For low values of r the 
abbreviated formula is quite satisfactory, but for high values of r it 
Yields an overestimation of the relationship. The foregoing limitations 
have been overcome, to some extent, by the use of the graphs developed 
by Thurston? for the solution of tetrachoric correlation. The unwieldy 
formula of Pearson’s becomes somewhat more usable whenever p = q 
in each dichotomy and may be written as: 

ri = sine es 300°) 

Slaichert? has recently developed an empirical solution for adjusting 
ad — be 

N2 ` 
adjustment makes it possible to use the foregoing formula for approxi- 
mations of tetrachoric correlations. The Slaichert correction factors for 
adjusting for unequal number of cases differ but slightly from the square 
root of the ratio of the product of the row and column subtotals in equal 
dichotomies to the product of such totals in unequal dichotomies. Thus 
his correction factors approximate 


whenever p does not equal q in either dichotomy or both. This 


“Karl Pearson, “On the Correlation of Characters Not Quantitatively Measur- 
e Transactions Y the Royal Society, Series A, Volume 195, London, 1905, pp. 
-47. 

*L. Cheshire, M. Saffir, and L. L. Thurstone, Computing Diagrams for the Tetra- 
choric Correlation Coefficient (University of Chicago Bookstore, Chicago, 1933). ` 

* William M. Slaichert, Techniques for Estimating the Coeficient of Correlation 
from a Fourfold Table. Unpublished Ph.D. thesis (Ames, Iowa, Iowa State College 
Library, 1951). 


304 STATISTICAL METHODS 


N G i N 
3 AN2 A2 N2 w | 1 
@t+be+datob+d © CIN AFANES) 


The sine formula may then be made to read 


Ti = sine (ad — be) 90° | 
(a + b)(c + d)(a + c)(b + d) 
Since the expression in the brackets is the phi coefficient, the formula here 
proposed for tetrachoric correlation is rë = sine ($ 90?) which was pre- 
viously suggested in the discussion of the phi coefficient. For convenience 
an Appendix table has been prepared in which this equation has been 
solved for 7; for various values of ¢. 

It should be further noted that the method here shown, as well as 
Pearson’s original formula, demands that neither distribution be muti- 
lated such as would occur if the upper and lower one-fourth of a class 
constituted the dichotomous categories with the middle half eliminated 
from the analysis. Some attention! has been given to the analysis of such 
mutilated distributions but the use of such procedure, in most cases, would 
be more time-consuming than the inclusion of all the data using the 
method here shown. Insofar as all the data are retained for the computa- 
tion of the tetrachoric correlation coefficient, its significance from zero 
can be tested by evaluating the significance of the corresponding chi 
square, but this procedure is inappropriate whenever the distribution has 
been mutilated. 


MULTICELL CORRELATION 


In the chapter dealing with the coefficient of correlation, an appropriate 
technique was shown for obtaining the relationship in a two-way fre- — 
quency distribution, under the assumption that a sufficient number of class 
intervals is available for each distribution. The method for obtaining a 
correlation coefficient in a four-cell table has just been described. Neither 
of these methods is appropriate in situations in which the number of seg- 
ments lies between these two extremes. 

Little evidence can be found to indicate the number of class intervals 
necessary to justify the usual technique for product-moment correlation. 
Certainly a3 x 5 table should not be so analyzed. It is unfortunate that 
suitable techniques for correlation analysis with coarsely classified distri- 
butions have received so little attention since many situations in educa- 
tion and psychology are necessarily so classified. 

Whenever the number of intervals is sufficiently large in each distribu- 
tion, the usual formula for product-moment correlation, r = FL, is 

T¿0y 


*C. C. Peters, and W. R. Van Voorhis, Statistical Procedures and Their Mathe- 
matical Bases (New York, McGraw-Hill Book Co., Inc., 1940), p. 382. 


OTHER TECHNIQUES OF CORRELATION ANALYSIS 305 


entirely satisfactory. With broad categories, however, this formula yields 
an underestimate of existing correlation. Peters and Van Voorhis' have 


developed a formula for broad categories, 7 = SEN which they suggest 
zy 


as suitable whenever measures are centered about means of intervals in 
a normal distribution of unit area and unit standard deviation. They 
further indicate that the coefficient obtained by this formula is an over- 
estimate of existing correlation, particularly whenever the correlation is 
high. The true correlation obviously is somewhere between the values 
obtained from the foregoing formulas. 


Dr e 
Rather than use the formula r = TA it here suggested, particularly 
zy 
ZI 


for high relationships, that the formula r = aes be employed and the 
Gau 


resulting coefficient be adjusted for coarse grouping by suitable correction 
factors. A table of such correction factors” is shown in the Appendix. ` 
For a preliminary explanation of this technique, a 2 X 2 table will be 


used. From the information shown in Table 123, the phi coefficient with — 
its resulting estimate of correlation has been found, between the tendency 
to remain free from scholastic probation and the response to a single item ` 
on an aptitude test. The same situation will serve the purpose of present- 
ing a gencral method useful in any multicell classification as well as in 
a2 x 2 classification. 

As a first step, numerical values of 1 and 0 are assigned to “yes” and “no” 
on probation status, respectively, and 1 and 0 to correct and incorrect 
aptitude item response, respectively. If the usual formula for the product- 


moment correlation, 7 = pL, is used, the correlation is 0.3409, identical 
070; 


pe D 

With the phi coefficient. These identical values are not fortuitous, they are 
mathematically identical. The assignment of 1 and 0 to the categories, 
an arbitrary decision, is perhaps not so satisfactory as the assignment of 
values based upon normal curve deviates as revealed by the frequencies 
in the dichotomies. Thus, of the 500 students, 420 or 84 per cent were 


8 z e 
free from scholastic probation. The mean sigma value, D obtained from 


1 Ibid., pp. 393-399. 
gjuhët pa Që P 
? The correction factor for a single dichotomous distribution is E where r = sine 
($ 909). For other segmentations the shape of the distribution is identical, with the 
(21 — 21)? 
Se 


maximum correction with a zero correlation of For a dichotomy this re- 


duces to JE The correction, when p = 9, reduces to a maximum of 1.263. For 
q 


a I ( jE 
. Zi — 2 
other segmentation the maximum correction is also 4/2 [e=] where the p’s are 


determined by the binomial theorem. 


306 STATISTICAL METHODS 
a table of the normal curve, is 0.28965 for the nonprobationers. The 5 for 


the 16 per cent representing probationers is — 1.52069. In the same manner, 
the mean sigma values for correct and incorrect responses to the apti- 
tude item are found to be 0.34995 and —1.39981, respectively. It is con- 
venient in solving for r with mean sigma values for the segments, to pre- 
pare a work table such as shown in Table 124. The entry in parentheses in 
each cell is the number of cases. The other entry is the product of the mean 


values, GG) of the cell entries. 
PAP 


TABLE 124. Information Needed for Estimating Correlation 
between Probation Status and Item Response 


CLEAR OF ITEM RESPONSE z 
PROBATION CORRECT INCORRECT z P 
Yes 0.101363* —0.405455 0.28965 
(361) (59) 
0.24331 
No —0.532165 2.128677 —1.52069 
(39) (41) 
5 0.27996 
0.34995 —1.39981 


* (0.28965) (0.34995) = 0.101363. 


The £zy is 
361(0.101363) + 59(—0.405455) + 39(—0.532165) + 41(2.128677) 
= 79.191 


. D . D 2 
The o? for a dichotomized variable is E Thus, for probation status 


2 
ot = (024331)? 944048 


~~ (0.84) (0.16) 


_ (0.27996)? _ 
”= (0.80) (0.20) — 048986 


which yield o’s of 0.6637 and 0.6999, respectively. 
The usual formula for product-moment correlation is r = 2, When 


and for item response 


the needed values are substituted in the formula dd 
al 79.191 8 
” = (500) (0.6637) (0.6999) T 3409 


This value agrees with the phi coefficient obtained by assigning values of 
unity and zero to the segments in a 2 X 2 table. The foregoing method 


OTHER TECHNIQUES OF CORRELATION ANALYSIS 307 


does have an advantage over the method of obtaining the phi coefficient 
in that it may be extended into distributions classified into more than two 
segments. The obtained coefficient, however, is an underestimate of the 
existing relationship. In a 2 x 2 table an estimate of the correlation may 
be made by considering r as ¢ and consulting the Appendix table of r= 
sine (¢ 90°). 

The correction factors in a 2 X 2 table, obtained from r = sine (¢ 90°) 
may be readily extended for variables which are classified into more than 
two segments. The table shown in the Appendix for correcting for coarse 
grouping may be employed for relationships when based upon variables 
classified into ten segments or less. If more than 10 segments are available 
for any given variable, the correction factor may be disregarded, or con- 
sidered as unity. 

For the example shown, the relationship between probation status and 
item response can be satisfactorily estimated by adjusting the phi coeffi- 
cient by the correction factors shown in the Appendix table. Thus, 


r = (0.3409) (1.224) (1.224) = 0.5107. 


which is in close agreement with the correlation indicated from the pro- 
posed method for tetrachoric correlation. 

The foregoing method may well be used for a greater number of cate- 
gories than a 2 x 2 table but less than the number of categories required 
for the satisfactory use of the usual formula for product-moment correla- 
tion. The minimum number of categories for replacement of the latter 


TABLE 125. Chemistry and Mathematics Marks for Engineering Freshmen 
OO U EE 


MATHEMATICS CHEMISTRY MARK 
MARK A B c D F TOTAL 
A 10 7 5 3 0 25 
B 9 17 14 8 2 50 
c 7 19 40 25 9 100 
D 4 6 14 16 10 50 
F 0 1 7 8 9 25 
Total 30 50 80 60 30 250 


technique demands an arbitrary decision. It may be that the standard 
proposed by Peters and Van Voorhis! of not less than about ten categories 
in either distribution for the use of product-moment correlation is not too 
rigid for the most careful analyses. 

An example of the application of the proposed formula is taken from 
the marks of A, B, C, D, and F in two different freshman courses in engl- 
neering. The information shown in Table 125 has been altered somewhat 


* Peters and Van Voorhis, op. cit., p. 393. 


308 STATISTICAL METHODS. 


for the purpose of ease in explanation. One method of obtaining the rela- 
tionship between marks in mathematics and chemistry is to assign values 
of 4, 3, 2, 1, and 0 to the marks and compute a product-moment coefficient 
as was done in the chapter on correlation. When this procedure is fol- 
lowed a coefficient of 0.4416 is obtained which should be recognized as an 
underestimate of the existing relationship. 

It is convenient in obtaining the coefficient of correlation by the proposed 
method to prepare a work table as shown in Table 126. As usual the four 
z-values separating the distributions into five segments are obtained from 
the table of the normal curve. These entries are found exactly as they were 
in a 2 X 2 table analysis. In each cell, the numbers in parentheses are the 
number of cases. The other entry is zu which is obtained by multiplying 


the for the mathematics mark by the be for the chemistry 


mark. 
The Zen needed for the solution of the formula is 


[10(2.9256) + 7(1.3827) + 5(0.9053) + 3(— 1.2729) + +++ + 9(2.9256)] 


e — 101.1498 
The = jj for mathematics marks is 
HS a (0.1722) + (0.0000): 1. e SI 
0.1 0.2 0.4 0.2 0.1 
= 0.9126 


e 2 
In a similar manner the = S| for chemistry marks equals 0.9182. 


For the solution of the formula ignoring coarse grouping then is 


101.1498 
r= —_——_—_ 5 = 0.442 
250V0.9126W 0.9182 SS 


'The correction factors shovyn in the Appendix table indicate an adjusted 
coefficient of 


r = (0.4420) (1.039) (1.039) = 0.4771 


The coefficient of correlation obtained by the formula here proposed differs 
but little from that obtained by the product-moment coefficient with arbi- 
trary values of 4, 3, 2, 1, and 0 assigned to A, B, C, D, and F, respectively. 
In many 5 X 5 tables in which the number of cases are not so symmetri- 
cally distributed, the coefficients obtained from the two methods would 
not be in such close agreement. 

An example in which the method here described is particularly useful 
may be noted in the information shown in Table 127. If the relationship 
is desired between the opinion of male farm-reared high school seniors 
concerning migration from the farm and their reaction, on a five-point 


"9926'S = (09921) (02991) sx 


d 
2816'0 PEELO E9ZT'0 6000'0 SPEL'O regen Eeer 
9216'0 ja 
02991 — 8SZL'0— 91S0'0 6LSLO 02991 Eer 
0007'0— TPLT'0- S910'0 9LST'O 0003'0 KE 
0003'0 TPLE'O 9LGE'O 0002°0 2 
(6) (8) (2) (1) (0) 
080£'0 mei ESLT0— 9926'S GAST ES60'0— LESET— Dep a 
geLT'O 
s (0D) (91) DO (9) Wal 
ESPT'O 0198'0— ZGLV'O— esey'T DE PPPOO— P8L90— eseri— a 
y Luten 
(6) (ez) (0P) (61) (2) 
0000°0 0000'0 0000'0 0000'0 0000'0 0000'0 0000'0 ei 
LLPE'O 
(2) (8) (FT) UT) (6) 
SHTO 0198'0 ZZLT'O eserT— SHEJ'O- FOO P8L9'0 Sort a 
seLT'O 
(0) (e) (9) U) (01) 
080€'0 09941 seLT'O 9¢26'3— DÄI: <060'0 128€ T +9926 v 
Pé d Wz — tz z a a ð a y aen 
¿("z — 12) Yz — 12 MUYW AHISINAHO SOILVAAHLYII 


(sir are setiyug) 
Vuen) PUD srmuayio qr Ut sy io yr UIING uoynpaloy buyouysg of papaan vogoruofuj “981 aTaVL 


309 


310 STATISTICAL METHODS 
TABLE 127. Migration Tendency and Farming Attitude of Farm Boys 


“FARMING IS A DESIRABLE WAY TO MAKE A LIVING” 


MIGRATION STRONGLY UNCER- DIS- STRONGLY 
INTENTION AGREE AGREE TAIN AGREE DISAGREE TOTAL 
Stay 15 17 6 2 0 40 
Uncertain 6 8 8 6 2 30 
Leave 19 25 36 32 18 130 
Total 40 50 50 40 20 200 


scale, to an item stated as “Farming Is a Desirable Way to Make a Liv- 
ing,” the proposed technique should be employed. The technique, of 
course, is predicated on the assumptions, which are not too unreasonable, 
that (1) the tendency to migrate from the farm is a characteristic which 
is normally distributed although not more accurately evaluated than by . 
a senior's response of “staying,” “doubtful,” or “leaving”; (2) attitude 
toward farming is a characteristic which is normally distributed although 
no more accurately evaluated than indicated by the five-point response 
to this item of “strongly agree,” “agree,” “uncertain,” “disagree,” or 
“strongly disagree”; and (3) any relationship between migration tendency 
and farming attitude can be assumed to be linear. The values to be as- 
signed to each of the three categories of migration intention as well as the 
values for each of the five categories of farming attitude can be found in 
a table of the normal curve of unit area as shown in Table 128. 


The Zry needed in the solution is: 


(15) (1.96000) + (17) (0.64854) + (6)(—0.26947) + --- + (18) (1.00009) 
= 53.42265 
The ei for migration tendency is 
(0.28000)? , (0.09040)? (0.37040)” 
0.20 0.15 0.65 


= 0.65755 


and for farming attitude is 


(0.28000)? , (0.11581)? , (—0.04812)? , (—0.17219)* , (—0.17550)* 
020 t 025 Tage + 0920 " Oom 


= 0.91116 


From the foregoing values, ignoring coarse grouping 


ie 53.42265 
200V0.65755V0.91116 
The correction factors with three and five segment variables for an un- 


adjusted correlation of 0.3451 are 1.099 and 1.042, respectively. Thus, 
r = 0.3451(1.099)(1.042) = 0.3952. 


= 0.3451. 


00984 T— 96098'0— ShZ6T' 0— 728970 ra 
0992 T'O— 6IZLT'0— SISPO'O— T8STT'O 00083'0 
Ss 692P8'0 1886£°0 00082'0 
(81) (ze) (98) (ez) (61) 
<8699'0— OFOLE'O— 60000'T T9O6P'O S960T'0 86£97'0— 64161 '0— 
OPOLE'O 
(2) (9) (8) (8) (9) 
19209'0 0F060'0 69280'T— Jeton: SSSTI'O— 06823'0 SSZFS'O 
000830 
(0) (2) (9) (21) (e) 
0000P"T 00083'0 002SP'G— E£8071— LP6970— 798790 00096'T 
d KE) z “as ‘a a “y Kai 
Y — 1z 


«ONIAIT V TAVI OL AVAL ATIAVUISAA V SI ÐNIWUVA,, 


oaagart 


è 


moasoun 


Zeg 


NOLLNGLNI 
NOILVUĐIW 


Burrin piono g, app pun fiouopuaj, uoyobryy vona uoynjauuoy Suynunyss sof popan vogoucofut “881 TIAL 


311 


2 STATISTICAL METHODS 
31 


SERIAL CORRELATION 


j i lso be used with serial correlation. If 
The methods m reao Ae coefficients are almost identical with | 
E SCT om the techniques described in the chapter, “Serial Cor- 
Mos epee ae Analysis.” As the relationship increases in mag- 
na m Ge techniques overestimate the degree of relationship and 
e Ph si së in careful research by the more conservative methods 
oe rife chapter. On the other hand, if the usual methods of serial 
Së dënesë been used, the obtained coefficients may be multiplied by 
correla’ 


z E = SZ) 
D 
d rection factor applied to the product. 

e ei eit ee the procedure. The usual biserial correlation 

aap e a coefficient of 0.513 between I, Q. and graduation tend- 
Posa LN group of 198 high school pupils, If this Coefficient of correla- ~ 
ency d 
tion should be multiplied by 


ër 


z f Da. OS A a 
or its equivalent in a dichotomy Vat which in this situation is 0.7092, 


the resulting product is 0.364, The correction factor for 0.364 is 1.219, 
Ae n estimated coefficient of correlation of 0.444, 
RER SEO and the adjusted correlations ar 
es The former indicates the relationship 
ssed variable is predicted from a numerical V: 
Lo indicates the intrinsic relationship existing b 
characteristics one of which is dichotomously express 
segments is increased 4 which variables are expres 
identical. 
a Pd Correlation in Coarsely Grouped Distributi 
deet involving coarsely grouped variables, a test of th 
from zero is desired for the linear relationship. Chi square h 
employed for this purpose, even though a chi Square value i 
of the order in vyhich the segments are arranged. A mo: 
may be made from the usual formula for the significa 


e both coefficients of 
if a dichotomously 
ariable, whereas the 
etween two variable 
ed. As the number of 
sed these values ofr 


ons. In many 
e significance 
as often been 
s independent 
re appropriate test 


nce from zero of a 
product-moment coefficient of correlation, i.e., 
de qeni = 2) 
UK 


where the r is the coefficient obtained prior to adjusting for Coarse group- 
ing. 


AA €. o 


OTHER TECHNIQUES OF CORRELATION ANALYSIS 313 


From the information shown in Table 127 concerning the relationship 
between intentions of remaining on the farm and attitude toward farming, 
an unadjusted coefficient of correlation of 0.3451 was found, If this value 
is substituted in the foregoing test of significance 


¿= _ [(0.3451)?(200 — 2) — 517 
1 — (0.3451)? 

Summary of Coarsely Grouped Variables. The use of the proposed 
method for obtaining a coefficient of correlation is predicated upon the 
assumptions of normal distributions and linear relationship. It should be 
evident that, with a nonnumerical variable, the distribution of cases yields 
no evidence of nonnormality in the characteristic evaluated. For example, 
the assumption of normality in a distribution of adult male heights is not 
questioned. Yet if heights are classified as short, average, and tall, the 
number of cases in each segment will depend upon some individuals’ def- 
inition of these segments. 

Decisions concerning normality and linear relationship should be a 
responsibility of the specialists in an applied field rather than of the 
statistician. The latter may be inclined to analyze a 3 X 5 table by the 
chi-square technique although two continuous categories are involved. It 
should be noted that chi square is appropriate whenever the categories 
of a distribution can be no better arranged than random placement, 

With regard to linearity of relationship, the judgment of competent 
authority in the field should be obtained. For example, if diets are classi- 
fied as good, average, or poor, and weights are classified as overweight, 
normal, or underweight, the assumption of linearity may be questioned. 
Perhaps a good diet is associated with normal weights, and poor diet with 
either overweight or underweight. The statistician must defer to the nu- 
trition specialist concerning the assumptions of both linear relationships 
and normality. It is possible to obtain some evidence of linearity from 
the data alone as the number of segments in a variable is increased. 


COEFFICIENTS IN NONVARIABLE DISTRIBUTIONS 


Two-way frequency tables include two distributions, both, one or nei- 
ther of which may be variable characteristics. When both distributions 
are variables the usual product-moment correlation technique is appro- 
priate except when the variables are coarsely grouped. For such coarsely 
grouped distributions, the methods described in this chapter are used. In 
any such analysis, two values may be reported. One value represents the 
relationship in the group studied. The coefficient of correlation, or an esti- 
mate of its value in coarsely grouped distributions, serves this purpose. 
‘Another value represents the confidence that can be placed in the relation- 
ship as indicative of the relationship in a population from which the group 
studied may be considered a random sample. The latter value is obtained 


314 STATISTICAL METHODS 


from the t-test in which the probability of obtaining a correlation in a 
sample from an uncorrelated population may be evaluated. In certain 
situations'either of the foregoing statistical measures may yield interpre- 
tations from highly useful to the ridiculous. For example, a coefficient of 
correlatiojë based on five cases is almost without meaning. On the other 
hand, a t-test of the null hypothesis that no relationship exists between 
the scores on the odd and even items on a scholastic aptitude examination 
is ridiculous to the research worker in education and psychology. Between 
the foregoing two extreme situations, the decision of which of the two 
approaches is the more meaningful becomes debatable. In case of doubt, 
the research worker will report both. 

Point Biserial Correlation. If one of the distributions is a variable and 
the other a nonvariable, a t-test of the difference between two means 
will provide a satisfactory test of significance. A statistical value indicat- 
ing the degree of relationship, however, becomes difficult to conceive. 
Such a statistical measure, called point biserial r, is sometimes employed 
when one distribution is numerically expressed and the other occurs in a 
two-group nonvariable characteristic. For example, one distribution may 
be sex of adults and another their heights. If a test of the null hypothesis 
is desired that there is no difference between the average heights of adult 
men and women, the t-test of the difference between two means can be 
applied. If a description of height as a sex characteristic is desired, one 
of three methods of reporting has been employed. The first. method con- 
sists of reporting the difference between the mean heights of men and 
women in the group studied. A second method consists of presenting in a 
table the heights of men and women in a frequency distribution. This 
method has the advantage of presenting, inaccurate though it may be, 
the amount of overlapping in the distribution of heights. A third method 
consists of reporting a point biserial r from the formula 


a mei 

This coefficient, like the coefficient of correlation, varies from zero to 
unity and may carry a positive or a negative sign. Interpretation of the 
point biserial r is difficult whenever the two-group distribution is a non- 
variable which, however, represents the condition required for the ap- 
propriate use of the formula. It is doubtful whether the formula should 
be used except in situations in which the difference in means is not a 
more satisfactory interpretation. 

The use of point biserial r in test item analysis is questionable since 
each item presumably has been designed to indicate the degree of the 
same characteristic as the total test. It should be recognized that al- 
though responses to a single item may not be more accurately evaluated 
than as correct and incorrect, the number of individuals making correct 


OTHER TECHNIQUES OF CORRELATION ANALYSIS 315 


and incorrect responses is a function of item difficulty rather than any 
evidence of normality. In this respect, test item responses are similar to 
a distribution of heights of male adults. If these heights are: classified 
as tall and short, the distribution of cases depends upon the proposed 
standard for classification. The percentages of cases listed as tall and 
short will be quite different if the standard is 5 feet 3 inches, 5 feet 8 
inches or 6 feet 2 inches. The assumption of normality of heights of male 
adults cannot be justified on the basis of percentage distribution. 

Questionnaire items, in most cases, although perhaps requiring a Yes- 
No response, are designed to yield upon combination a variable charac- 
teristic. Thus economic status may be evaluated by a series of items 
requiring a Yes-No response such as “Do you own an automobile?” “Do 
you own a radio?” and “Do you own a television set?” Relationships 
between any numerical distribution and response to such items can be 
defended as variable characteristics more easily than as nonvariable 
characteristics and the point-biserial coefficient, if computed, should be 
adjusted in such situations by the tabled values of corrections for coarse 
grouping shown in the Appendix. 

Coefficient of Contingency. In certain situations both distributions are 
nonvariable characteristics such as denominational church membership 
and political party affiliation. In such situations, the appropriate test of 
significance is chi square. If a descriptive index is desired the coefficient 
of contingency may be found from the formula 


The coefficient of contingency represents the degree to which frequencies 
might be assigned at random to the cells or the tendency for certain 
characteristics in the one distribution to be associated with certain char- 
acteristics in the other distribution. Roughly, the value of the coefficient 
of contingency varies from zero to some value approximating unity. The 
maximum possible value depends upon the number of cells in the distri- 
bution. Thus, in a 2 X 2 table the maximum coefficient of contingency is 
0.71 and in a 10 X 10 table it is 0.95. 

The use of the coefficient of contingency is extremely limited in educa- 
tion and psychology since human characteristics generally are variable 
rather than nonvariable traits. Once again it seems appropriate to em- 
phasize that statistical treatment of data as variable or nonvariable 
requires intimate familiarity with the theory in an applied field rather 
than such familiarity with the theory of statistical methodology. 


Exercises 


1. Test the significance of the difference between the following coefficients of 
correlation: 


316 STATISTICAL METHODS 


(a) 0.430, N — 45 (6) 0.760, N — 214 
0.334, N — 52 0.713, N = 311 


2. What are the fiducial limits, in terms of 7, of a coefficient of correlation of 
0.849 based on 62 cases? 


3. What is the probability that the following coefficients of correlation were 
obtained from the same population? 


r = 0.678, N = 71 
r = 0.649, N = 84 
r = 0.593, N = 35 
r = 0.482, N = 92 
4. a. What is the average coefficient of correlation from the data in Exer- 


cise 3? 
b. What are the fiducial limits of this average coefficient of correlation? 
5. Test the significance of the difference between r,, and Tyo. 


Tn = 0.53 N = 63 
Ty = 0.65 
Tr = 0.33 


6. What is the tetrachoric correlation coefficient for the following data? 


Ee eee ee 


VOTED IN LAST IN FAVOR OF BOND ISSUE 
EL 
SCHOOL BOARD ELECTION YES NO TOTAL 
Yes 69 14 83 
No 17 32 49 
Total 86 46 132 


7. An investigator asked a school superintendent to classify the economic 
status of the families of a graduating class according to the following scheme: 
II Could completely finance child's college education without sacrifice; II Could 
completely finance child’s college education by making some sacrifices ; I Could 
give only partial financial support to child's college education; O Could give no 
consistent financial support to child’s college education. 

Five years later the members of the class were classified according to their 
tendency to complete college. The following two-way distribution then resulted: 


COLLEGE FINANCIAL STATUS 

ATTENDANCE 0 I D HI TOTAL 
Did not matriculate 10 17 19 6 52 
Matriculated 3 15 14 2 34 
Graduated 2 9 1 2 24 


de. Compute the coefficient of correlation. 
b. Is this correlation significantly different from zero? 


8. Find the correlation coefficient between apprentice teaching final marks 
and personality ratings of the 140 students classified in the following table. 


ft 


OTHER TECHNIQUES OF CORRELATION ANALYSIS 317 
FINAL MARKS 
PERSONALITY A B € D F 
RATING CODE 4 3 2 1 0 TOTAL 
Excellent 3 11 13 9 0 0 33 
Good 2 8 27 20 1 0 56 
Fair 1 4 4 15 6 2 31 
Poor 0 0 2 6 7 5 20 
Total 23 46 50 14 7 140 


17 


Statistical Techniques 


in Measurement 


Test scores represent one of the major sources of evidence in educa- 
tional and psychological research. Even the most cursory examination of 
the current literature in the social sciences will attest to the contribution 
of testing in this area. The body of knowledge in both education and 
psychology has been greatly expanded through the development of meth- 
ods for measuring those aspects of behavior with which each field is 
concerned. 

For purposes of this chapter, measurement in education and psychology 
has been subdivided into several areas. Educational and psychological 
tests will first be classified according to their purpose. The characteristics 
of a perfect measuring instrument will then be discussed. Selected pro- 
cedures for statistically determining reliability and validity will be pre- 
sented, followed by a discussion of statistical techniques in item analysis. 

Many types of educational and psychological research problems in- 
volve the administration of tests and the subsequent statistical analysis 
of the test results. In some instances the investigator may be able to 
select tests which have been previously developed and use these in the 
solution of a research problem. For example, if the problem involved 
determining the relation between intelligence and age at which children 
began to talk, it would probably be unnecessary for the investigator to 
develop a new instrument for measuring intelligence. Rather he would 
be more likely to employ one of the widely used tests of intelligence 
already available to him. Selection of tests for a particular problem and 
subsequent interpretation of their usefulness requires familiarity not only 
with the availability of the tests but also with the statistical techniques 
of analyzing tests, 

Frequently no suitable measuring instrument is available and it be- 
comes necessary to construct a new device. For example, if a measure of 

318 


> 2 
—_ 2 


STATISTICAL TECHNIQUES IN MEASUREMENT 319 


cautiousness is required in an investigation, it might be necessary to con- 
struct a test for this purpose. In this instance, familiarity with the tech- 
niques used in analyzing and refining tests is essential. 


CLASSIFICATION OF TESTS ACCORDING 
TO PURPOSE 


From the standpoint of purpose, tests may be classified into three general 
groups: aptitude tests, achievement tests, and “personality” tests. Apti- 
tude tests are developed for the purpose of predicting the degree of 
achievement to be expected from individuals in given activities. The 
major category of aptitude tests may be subdivided into tests of general 
aptitude and tests of special aptitude. In general, these two categories 
are based upon the degree of specificity involved in the predicted activity. 
The most common examples of general aptitude tests are the tests of 
intelligence as well as the scholastic aptitude tests which are frequently 
given to college freshmen as a part of pre-registration test batteries. Tests 
of special aptitude include those designed to predict success in individual 
academic courses such as geometry or algebra. Further examples include 
the tests which have been designed for relatively unique or limited 
aspects of behavior such as mechanical, clerical, and manual activities, 
or special talents such as musical or artistic ability. 

Achievement tests are instruments devised to indicate pupil attainment 
resulting from formal educational instruction or from informal educa- 
tional environments with which students come in contact. Achievement 
tests are not confined to those tests which measure only the amount of 
informational content which a pupil possesses in specified subject-matter 
areas. Rather, achievement testing should parallel all the educational 
objectives which have been designated for each educational experience. 
The procedure of evaluating the amount of information which can be 
recalled and from evidence concerning only this specific objective infer- 
ring the totality of pupil achievement is no longer adherred to in educa- 
tional theory even though it may still be followed to some extent in 
educational practice. s 

The group of tests classified as “personality” includes such miscella- 
neous tests as those of personal adjustment, values, interests, and atti- 
tudes. Perhaps this group of tests should have been called miscellaneous 
rather than “personality.” Construction of tests of this type is highly 
specialized and requires considerable experience with and considerable 
formal training in these aspects of behavior. A detailed discussion of this 
type test in all its applications is beyond the purpose of this book. Since 
this category of tests is frequently involved in educational and psycho- 
logical research problems, however, recognition of this category is desir- 
able. The foregoing classification of tests according to purpose is only 


320 STATISTICAL METHODS 


one of many classifications of tests which might logically be made. The 
foregoing classification serves to emphasize the varied purposes of tests 
which are used in education and psychology. 


CHARACTERISTICS OF A PERFECT MEASURING 
INSTRUMENT 


The satisfactoriness of any instrument of evaluation must always be 
appraised in terms of the purpose for which it is constructed, Charac- 
teristics of a satisfactory instrument cannot be thought of apart from 
the purpose for which the instrument is constructed. Regardless of pur- 
pose, however, it is possible to postulate a number of characteristics of 
measuring instruments which will apply to all measurement situations. 
Such characteristics then become standards or criteria with which to judge 
any given instrument. 

The following are postulated characteristics of a perfect measuring 
instrument without regard to their relative importance: 


1. A perfect instrument should be administratively feasible to construct and 
use. 

2. A perfect instrument should be adapted to the range of the characteristic 
to be evaluated. 

3. A perfect instrument should be calibrated so that the units of measurement 
are equal. 

4. A perfect instrument should yield absolute rather than relative readings, 
or scores. 

5. A perfect instrument should be sensitive. 

6. A perfect instrument should be available in duplicate form. 

7. A perfect instrument should be accompanied by norms or standards for 

interpretation. 


8. A perfect instrument should yield readings, or scores, free from error. 


Of course it is true that an instrument could be highly satisfactory for 
certain purposes and yet be far from the perfect instrument which has 
been postulated in the foregoing statements. 

The first characteristic of a perfect instrument is administrative feasi- 
bility. The necessity for giving careful consideration to this characteristic 
is easily recognized. The time available for scoring a test will occasion- 
ally predetermine the type of instrument to be used, whereas in other 
cases the immediacy of need for an instrument will predetermine that the 
type of instrument will be such that it can be constructed in the shortest 
period of time. This is the situation confronting the classroom teacher 
who must decide what type of test to administer in the evaluation of 
pupil achievement. Another example is that of an investigator who must 
decide between the administration of an individual intelligence test or 


a group test of intelligence. If the available time for the investigation is 


limited, it may be more feasible to administer the group test. 
An even more important example of this characteristic of administra- 


Ka 


| 


STATISTICAL TECHNIQUES IN MEASUREMENT 321 


tive feasibility is apparent in the substitution of indirect evaluation for 
direct evaluation. For example, suppose the present temperature is to be 
determined. Physicists define temperature in terms of the energy oi 
molecules. An instrument to measure the energy of molecules directly, 
even if such an instrument could be constructed, might not be adminis- 
tratively feasible. Consequently, an instrument is constructed which in- 
dicates how much expansion takes place in a volume of mercury. The 
thermometer then becomes an administratively feasible means of meas- 
uring temperature. 

The substitution of an indirect for a direct instrument to accomplish 
administrative feasibility may be accompanied by serious error. In the 
thermometer little difference is found between a measurement made by an 
indirect instrument from that which would be obtained from a direct 
instrument. There may be serious discrepancy, however, between scores 
on a paper-and-pencil test and direct appraisal of the behavior that the 
test is designed to measure. There are few instances in education and 
psychology which will illustrate direct measurement since almost no direct 
evaluation of behavior is to be had. Nevertheless, the statement cannot 
be overemphasized that the substituting of an indirect for a direct in- 
strument is justified only when readings, or scores, are produced by the 
former which are sufficiently similar to those produced by the latter to 
satisfy the purposes of the evaluation. 

The second characteristic of the perfect instrument is that it can be 
adapted to the range of the characteristic to be evaluated. For example, 
the mercury thermometer is not satisfactory for temperatures lower than 
40° below zero, since mercury solidifies slightly before that low a tem- 
perature is reached. In educational and psychological evaluation the same 
principle is noted, in that a test of simple addition and subtraction is not 
a suitable instrument for evaluating relative skill in arithmetic of pupils 
in the secondary school. Anyone who constructs a test must recognize the 
appropriate level of performance of the subjects to whom the test will 
be administered. Successive revisions of a test based upon analysis of the 
responses of subjects to the individual items of a test make this charac- 
teristic relatively attainable by a test constructor, however. 

The third characteristic of the perfect instrument is that it will yield 
readings, or scores, which may be graduated into equal units. The meas- 
urement of length by using the inch as the unit of measure is an excellent 
example of the attainment of this characteristic. An inch is an inch at all 
times. The difference between 12 inches and 13 inches is exactly equal to 
the difference between one mile and one inch and one mile and two inches. 
In other words, the unit of measurement is constant throughout the range 
to be evaluated. 

For some time it has been recognized that educational and psycho- 
logical evaluations are in terms of so-called rubber units, They may or 


322 STATISTICAL METHODS 


may not be equal throughout the range. For example, if upon an examina- 
tion student A makes a score of 19, student B a score of 20, student C a 
score of 61, and student D a score of 62, is the difference between stu- 
dents A and B equal to the difference between students C and D? Un- 
fortunately, no one knows. A difference of one score point at the lower 
end of the range may be entirely unlike a similar difference at the middle 
or at the upper end of the range. In educational and psychological evalu- 
ation, it is not demanded that this characteristic be attained in the con- 
struction of instruments. 

The fourth characteristic of a perfect instrument is that it will yield 
absolute rather than relative readings, or scores. This is another way of 
saying that the location of the zero point of the characteristic being 
evaluated and of the zero point on the scale should coincide. For example, 
if an object is 16 inches long, it is twice as long as an object which is 8 
inches long. Ten pounds is just ten pounds heavier than an object having 
no weight at all. Instruments to evaluate length and weight produce abso- 
lute readings, or scores. Contrast this type of reading, or score, to that 
obtained from the measurement of behavior. For example, if a student 
makes a score of zero on an arithmetic test, does this mean that he has 
no achievement in arithmetic? If student A makes a score of 10 and stu- 
dent B a score of 20 on an arithmetic test, does student B know twice 
as much as student A. Unfortunately, no one knows. 

This limitation is not so serious as first thought might lead one to 
believe. With the centigrade or Fahrenheit scales, the zero point on the 
scale does not coincide with the zero point of temperature—absolute 
zero. Even though this is known, satisfactory results are still obtained 
by using these scales. True, it cannot be said that it is twice as hot when 
the temperature is 72° as it is when the temperature is 36%, The only 
limitations which are placed upon interpretations when the zero point of 
the scale and the zero point of the characteristic do not coincide are (1) 
the impossibility of interpreting ratios between readings or scores and 
(2) the impossibility of inferring from the test scores the location of the 
point of no amount of the characteristic being measured. 

The fifth characteristic of a perfect instrument is sensitivity. No one 
would attempt to weigh a diamond on cattle scales, since cattle scales 
obviously are not sensitive enough for that purpose. It is quite obvious, 
also, that if an experiment is carried on with various types of foods for 
babies, using weekly gain in weight as a criterion, an instrument is 
needed which is more sensitive than cattle scales would be. That an 
instrument should be sufficiently sensitive for the desired purpose is evi- 
dent, so long as weight and measures of material objects are being con- 
sidered. It is unfortunate that the limited interpretations which can be 
made from a crude instrument are not so apparent when attempts are 
made to evaluate behavior. Experimentation has been conducted over a 


A 


ndë 


WE "em: Se 


SS 


STATISTICAL TECHNIQUES IN MEASUREMENT 323 


period of a week or a month in which no differences were reported; yet 
in actual fact, there may have been appreciable differences which could 
not be detected with the crude tests used as criteria. There are any num- 
ber of tests where normal growth of a student for an entire year is repre- 
sented by two, three, or four units on the scale which the examination 
yields. Obviously, a test of this type lacks the necessary sensitivity to 
serve as a criterion for evaluating experimental procedures for the dura- 
tion of a week or a month, particularly with the small number of sub- 
jects usually available for experimentation. Investigators must realize 
that interpretations of educational and psychological experiments cannot 
be made on the basis of “analytical balance sensitivity” when the out- 
comes are evaluated with “cattle-scale erudity.” 

The sixth characteristic of a perfect instrument is that it should be 
available in duplicate form. If, for any reason, a second measurement of 
temperature is needed, a thermometer js read a second time. With the 
measurement of human behavior, however, the second measurement in- 
volves difficulty not apparent in a unitary instrument like a thermometer. 
Since items are sampled in the construction of a test a subject may pur- 
posely change his behavior in anticipation of a repetition of the same 
test. It is, therefore, advantageous to have duplicate test forms. Logi- 
cally, the second form should be equivalent, but should contain a different 
sample of items. Duplicate forms are also useful (1) for the purpose of 
measuring change in whatever characteristic is measured by the test; (2) 
for the purpose of minimizing the number of proctors needed in test 
administration by assigning different forms to adjacent subjects; and 
(3) for the purpose of make-up tests in the case of failure to take a test 
when it is administered in a group situation, due to illness or other causes. 

The seventh characteristic of a perfect instrument is that it will be 
accompanied with appropriate norms or standards for interpretation. The 
usefulness of a thermometer would be decidedly minimized if it were not 
for generally known norms for its reading. Thus, 32° F. is the tempera- 
ture at which water freezes; 72° F. is a healthful room temperature; 
212° F. is the temperature at which water boils. These standards or norms 
give meaning to thermometer readings expressed in units of Fahrenheit 
degrees. Similar standards or norms are needed to give meaning to test 
scores, Accompanying most published tests are norms which have been 
developed from the administration of the instruments to many groups. 
Suitability of the norms accompanying a test will depend upon (1) the 
comparability of the available group to the standardization group, (2) 
the size and adequacy of the standardization groups, and (3) the degree 
of refinement in which the results of the previous testing is expressed. 

The eighth characteristic of a perfect instrument js that it should yield 
readings, or scores, free from error. The type and magnitude of these 
errors differ from instrument to instrument. There is no perfect instru- 


324 STATISTICAL METHODS 


ment in this respect. Certain kinds of errors creep into the readings or 
scores which are obtained from all types and kinds of instruments. 

In approaching a discussion of types of errors, it is necessary to con- 
sider first the various types of measuring instruments. Some instruments 
are unitary, such as the thermometer. On the other hand, some instru- 
ments, like most educational and psychological tests, are made up of a 
number of subinstruments generally called items. Each of these items 
purports to measure the same aspect of behavior that the total instrument 
measures. The types of errors which creep into the readings obtained 
from instruments of this kind differ from those noted in connection with 
unitary instruments. 

Both the unitary instrument and the compound instrument are utilized 
in educational and psychological measurement. If the behavior to be 
evaluated is so unique that a single response obtained just once yields 
the evaluation, a unitary instrument is used. For example, if recall of 
the number of states in the United States is wanted, evaluation consisting 
of one item will suffice. Pupil response is directly evaluated from this 
unique instrument which asks for the number of states in the United 
States. 

Unfortunately, perhaps, most aspects of behavior are too broad to be 
evaluated from a single response. A test, then, must consist of a sampling 
of the situations in which this ability might be noted. In this way a com- 
pound instrument is developed consisting of items, each of which is de- 
signed for the purpose of evaluating some aspect of this ability. 

All the types of errors which prevail in a unitary instrument are found 


there are many other ratios between the number 
which may be obtained in six trials besides the 3 
words, there are likely to be e 
out the probability of heads 
throws as six, However, as the 


of heads and tails 
-3 combination. In other 
rrors whenever an attempt is made to find 
coming up from so limited a sample of 


number of throws increases, the ratio of 


A Vi E ` SH SE T A e, 


AA te 


Ki 


STATISTICAL TECHNIQUES IN MEASUREMENT 325 


heads to tails will approach unity. Errors of this type are called com- 
pensating errors. Contrast these errors with the type that will be obtained 
jf a penny is used which happens to be loaded in such a way that it will 
fall tails more often than heads. In this case a biased error is introduced, 
and no amount of increasing the number of throws or trials will produce 
a ratio of unity between the number of heads and the number of tails. 

Another example which may serve to differentiate biased from com- 
pensating errors may be had from the familiar thermometer. If in its 
manufacture a thermometer has been graduated so that all readings are 
two degrees too high, an infinite number of readings by an infinite num- 
ber of individuals will not yield an approximation to the true temperature. 
This type of error is clearly one of bias. On the other hand, variations of 
a degree or more, depending upon the accuracy of the thermometer used 
in the reading, may be noted when an individual makes the reading more 
than once, or when different individuals read the same thermometer. 
Some of the readings by an individual will be too high and others too 
low. There is a tendency, however, for these errors to cancel each other 
as the number of the readings is indefinitely increased. Also, some indi- 
viduals may have a tendency to read too high, whereas others will have a 
tendency to read too low; but as the number of individuals is increased, 
these errors also have a tendency to cancel each other, thereby falling 
into the class called compensating errors. 

The satisfactoriness of an instrument may be determined in the light 
of these two types of errors. Compensating errors, it will be noted, have 
a tendency to disappear (1) as the number of readings made by an in- 
dividual is increased, (2) as the number of individuals making the read- 
ing is increased, or (3) as the number of jnstruments is increased. When 
all three of these factors are increased to infinity, compensating errors 
entirely disappear. No increase in these three factors, however, will 
eliminate the biased errors. 

There are five types of biased errors prevailing in compound instru- 
ments such as educational and psychological tests. The first type consists 
of errors due to the test constructor's failure to break down the charac- 
teristic to be evaluated to the place where it is relatively homogeneous. 
Suppose an educational experience is provided for students with the pur- 
pose of contributing to the following objectives: (a) the ability to recall 
information, (b) the ability to interpret data, (c) the stimulation of 
interest, (d) the ability to withhold judgment pending acquisition of 
evidence, and (e) skill in laboratory manipulations. Should someone con- 
struct a compound instrument consisting of five items, one for each of the 
above objectives, the total score is practically useless for purposes of 
interpretation. No one would attempt to evaluate weather conditions as a 
totality. For example, the temperature is 50°, the relative humidity is 40, 
the wind velocity is 20 miles per hour, and the barometer reading is 


326 STATISTICAL METHODS 


30, making a total of 140. Yet who would say that the weather condition 
is 140? How much more meaning can be attached to a total score ob- 
tained on an instrument evaluating highly diversified objectives? 

The second type of biased error prevailing in educational and psycho- 
logical evaluation arises from the test constructor’s failure to choose items 
which measure the desired function. The evaluator’s problem at this point 
consists of making sure that each item yields the type of evidence which 
indicates attainment of the objective. No one attempts to find tempera- 
ture with a stop watch, since this instrument does not furnish evidence of 
fluctuations in temperature. In a similar manner, no one should attempt 
to evaluate the ability to interpret data by an informational item, since 
responses to that item will not give, directly at least, evidence of fluctua- 
tions in the ability to interpret data. 

The third type of biased error prevailing in educational and psycho- 
logical testing arises from the test constructor’s failure to choose a good 
cross-section of items. Even though a test is limited (1) to some aspect 
of behavior which is relatively homogeneous and (2) to items which 
measure the desired function, a guarantee is desired that the items furnish 
a good cross-section of all the items which might have been chosen. 
Obviously, an examination in world history cannot be limited to items 
of American history or to items of any specific period. The relative num- 
ber of items to be devoted to each of these areas is, in the final analysis, 
a matter of the philosophy of the test constructor. If a standardized 
philosophy regarding the content of a particular course ever prevails— 
and it is hoped that such never will be the case—it will become possible 
to build a test without this type of error. Recognition of this type of 
error clearly places the responsibility for the assembling of items for a 
test upon those who use the test rather than upon a test technician un- 
familiar with the aspect of behavior which is being evaluated. 

The fourth type of biased error arises from the test constructor’s 
failure to weight the readings or scores on the items in the ratio of their 
importance. Even though the first three types of biased errors are absent, 
still the problem of weighting the responses must be considered. In most 
tests used in education and psychology the scores have been found by 
direct addition, which means that all items have been given equal weight. 
Theoretically, the assumption of equal weighting is serious; practically, 
this assumption is not so difficult to make, since it has been found em- 
pirically that the weights of different items must be changed radically 
in order to produce significant fluctuations in total scores, 

The fifth type of biased error arises from the incompetency of the 
reader of the instrument. Satisfactory readings of an instrument by an 
individual are predicated on some degree of competency in his ability, 
including his integrity, to recognize the evaluation which the instrument 
furnished. For example, some thermometers are calibrated in 2° units 


SEENEN E TT lë NË 
CE = pë 


k- 


STATISTICAL TECHNIQUES IN MEASUREMENT 327 


rather than in units of 12. In this case, one type of incompetence noted 
is the tendency to report a temperature of 61° as 62°. Errors resulting 
from this kind of incompetency have no necessary tendency to cancel 
each other as the number of readers is increased. An even more pro- 
nounced type of incompetency is noted when an individual reports speed- 
ometer readings somewhat higher than actually observed, or when a 
reader of an essay examination gives higher scores to certain individuals 
than to others, due to his desire to misrepresent the evaluation which 
the instrument yields. 

In addition to these five types of biased errors, there are three types of 
compensating errors which must be recognized in instruments for the 
evaluation of behavior. The first type of compensating error results from 
the impossibility of including an infinite number of items in a compound 
instrument. Even though a good cross-section of items is chosen, there 
will be some variation between the scores which are made on one series 
of situations or items from those which would be obtained from other 
lists of situations or items. 

The second type of compensating error is due to fluctuations in the 
behavior of a subject at different times. It has long been recognized that 
the responses of an individual may fluctuate even though no learning or 
forgetting has taken place. For example, if a student is asked to compute 
the product of 724 and 328, his response may be 237,472. If the same 
situation is presented to the same student ten minutes later, his response 
may be 236,372. Although practically it is impossible to eliminate this 
type of error, theoretically it can be done by presenting the same situa- 
tion an infinite number of times. 

The third type of compensating error occurs when the responses of the 
subject are scored once by a single judge rather than an infinite number 
of times by an infinite number of competent judges. Numerous investi- 
gations have been reported concerning this type of error, especially with 
reference to essay examinations. 

In summary, the usefulness of an instrument for evaluating behavior 
may be appraised in terms of the following types of errors: 

A. Biased errors resulting from: 

, Failure to break down the characteristic to be evaluated to the place 


where it is homogeneous. 
Failure to choose subinstruments or items which measure the desired 


m 


2, 
function. 
3. Failure to choose a good cross-section of items. 
4. Failure to weight the readings or scores on the items in the ratio of their 
importance. 
5. Reader incompetency or dishonesty. 
B. Compensating errors resulting from: 
1. Failure to include an infinite number of items. 
2, Failure to sample the reaction of subjects an infinite number of times. 
3. Failure to utilize an infinite number of judges in scoring the responses. 


328 STATISTICAL METHODS 


y VALIDITY AND RELIABILITY 


Definitions of validity and reliability are based upon the types of 
errors in the foregoing summary. There is far from perfect agreement 
among test constructors as to the type or types of errors to be included 
in each concept. For example, some authorities define validity to include 
all eight types of errors, whereas others limit validity to those of type 
A2, or biased errors resulting from the test constructor’s failure to choose 
items which measure the desired function. There seem to be about as 
many definitions of validity as there are possible combinations of the 
eight types of errors. Reliability, likewise, is defined in terms of various 
combinations of these types of errors. 

Despite differences among test constructors regarding these concepts, 
some definitions are desirable. Validity is defined as the degree to which 
all kinds of errors, compensating and biased, are absent ; and reliability 
is defined as the degree of consistency with which a test measures what- 
ever it does measure, i.e., the degree to which all compensating errors 
are absent. ` 

The definitions, involving as they do absence of errors, presuppose 
that true scores can be obtained. Actually, however, true scores cannot 
be determined. Thus, validity and reliability represent concepts which 
cannot be appraised directly, although by making certain assumptions 
their magnitudes can be estimated. 


academic achievement, the coefficient of correlation is a satisfactory esti- 
mate of the coefficient of validity. 
The amount of confidence wh 


cause of his philosophy, 
effectiveness has been computed 


The size of a coefficient of correlation between test Scores and a crite- 


Í mem 


STATISTICAL TECHNIQUES IN MEASUREMENT 329 


rion, like the size of all other coefficients of correlation, is a function of 
the range of talent in the group in which it has been computed. Thus, 
obtained coefficients will be lower when computed in a homogeneous group 
of students than when computed in a heterogeneous group. An appraisal 
of such coefficients of correlation used as measures of test validity, then, 
must consider (1) the satisfactoriness of the assumed criterion and (2) 
the homogeneity of the group in which it was computed. 

A less refined but occasionally used statistical procedure for investi- 
gating validity of a test is that of determining the extent to which the 
test will reflect differences among or between groups of subjects assumed 


. to possess differing amounts of the characteristic measured by the test. 


For example, an attitude-toward-farming scale reflected significant dif- 
ferences between successful farmers and others not engaged in farming. 
It can readily be seen that the meaning of the significance of any sta- 
tistic used in such circumstances must be subject to the same criteria for 
appraisal as for measures of predictive effectiveness. Thus, if only suc- 
cessful farmers were used in the foregoing example of the attitude toward 
farming scale, the differences reflected would probably be much greater 
than if the choice of this group were less restricted. 

Reliability has been defined as the degree of consistency with which a 
test measures whatever it does measure, or the degree to which all com- 
pensating errors are absent. One of the more serious types of compensat- 
ing errors in some tests is the inconsistency between the scores assigned 
by one judge and the scores assigned by a second judge to the perform- 
ance of a single group of subjects. This characteristic is referred to as 
objectivity, or reader reliability, and is usually reported as a coefficient 
of correlation between scores assigned by different judges. In many cases, 
the objectivity of the test is not seriously open to question as, for ex- 
ample, in the so-called objective tests. 

Most published tests report some evidence of test reliability. The inter- 
pretation of this reported reliability requires careful appraisal of the 
methods and techniques used by the investigator. There are three meth- 
ods in use for computing measures of reliability. First, fluctuations which 
appear when a test is readministered to the same group of subjects may 
be noted and evaluated. When this method is utilized, it is apparent that 
errors due to sampling test items are not allowed to fluctuate as though 
different test items were provided. The reported reliability will have a 
tendency to be higher than the reliability which would have been found 
had compensating errors due to sampling test items been allowed to 
fluctuate. With achievement and “personality” tests, a second objection 
arises since part of the fluctuations in scores between successive admin- 


. istrations may be due to different rates of growth or change in whatever 


characteristic the test measures. This objection may not be so serious if 
successive administrations are but a day or two apart. However, if the 


330 STATISTICAL METHODS 


re-administrations are very close together, the results may be influenced 
by differential practice effect in taking the test. This method of comput- 
ing reliability has been most frequently used with aptitude tests where it 
seems reasonable to assume that the measured behavior has not changed. 

Second, fluctuations in scores may be noted and evaluated whenever 
alternate forms of a test are administered to the same group of subjects, 
either at the same time or with a lapse of time intervening. This method 
of computing reliability is generally used when parallel forms of a test 
are available. 

The reliability obtained by comparing scores on parallel forms of a 
test will be influenced by the extent to which the test forms have been 
forced to be equivalent. If in the construction of the two forms, items 
are matched for difficulty as indicated from previous administration, the 
obtained reliability will be increased. This increase in the reliability re- 
sults from restricting the fluctuation in test items to less than that which 
would prevail if each form were an unrestricted random sample of items. 

Third, fluctuations in scores may be noted and evaluated when a single 
test is administered to one group of subjects. Of the two approaches 
to obtaining reliability from one administration, the most widely used 
method is based upon arbitrarily dividing the test into two subtests. 
These subtests are usually developed by considering the odd-numbered 
items as one test and the even-numbered items as a second test. Actually, 
measures of reliability computed in this case, represent reliabilities not 
of the whole test, but rather of half the test. From the obtained reli- 
ability in half the test, an appropriate formula (the Spearman-Brown 
formula) to be discussed in detail in a later portion of this chapter, will 
predict the expected reliability of a test twice as long as either half test. 
It can be seen that this method for computing reliability is a modifica- 
tion of the second method. The second approach to estimating reliability 
from a single administration involves computing a reliability coefficient 
based upon the relationships between the individual test items and their 
total score. 

The statistical measures of reliability which are computed are the co- 
efficient of reliability, the index of reliability, and the standard error of 
measurement. The coefficient of reliability is the coefficient of correlation 
between the scores made on successive administrations of the same test, 
between scores made on similar forms of a test, or between scores on the 
odd and even items after application of the Spearman-Brown formula. 
The coefficient of reliability, of course, assumes that an infinite number 
of items might have been chosen for the test. 

Like any coefficient of corr 
ity is a function of the rang 


STATISTICAL TECHNIQUES IN MEASUREMENT 331 


produce a coefficient of reliability of 0.90 or more when computed in a 
mixed group of pupils from grades four to eight, inclusive. The group of 
individuals utilized in determining reliability should not include subjects 
for whom the test is inappropriate. In investigations which have been de- 
signed to evaluate a more homogeneous group of subjects than for whom 
the test is appropriate the obtained coefficient of reliability will be 
reduced. 

A summarization, then, of the interpretation and meaning of a coeffi- 
cient of reliability needs to consider the underlying assumptions which 
are made when it is computed. These assumptions are (1) that errors 
existing between parallel forms or between the odd and even items, are 
compensating and that biased errors between forms of subtests are wholly 
absent; (2) that all types of compensating errors must be free to fluc- 
tuate in each form of the test; (3) that test items chosen are sampled 
from an infinite number of possible items. 

The second statistical measure of reliability is the index of reliability. 
Theoretically, it is the expected coefficient of correlation which would be 
obtained if the scores on a given test were correlated with the true scores 
on that test. It can be shown mathematically that the best estimate of 
this theoretical correlation is obtained by extracting the square root of 
the coefficient of reliability. Thus, the index of reliability is always greater 
than the coefficient of reliability. For example, a coefficient of reliability 
of 0.81 is the equivalent to an index of reliability of 0.90. Other than this 
difference, interpretations of the index of reliability are similar to inter- 
pretations of the coefficient of reliability, and, likewise, are subject to the 
same assumptions and limitations. 

Both the coefficient of reliability and the index of reliability are func- 
tions of the range of talent in the group in which they have been com- 
puted. In come cases, it is desirable to have a measure which is not subject 
to variations in range of scores. For that purpose, a statistical measure 
called the standard error of measurement, or the standard error of a test 
score, is quite often employed. 

Each time a test is administered to an individual, his obtained score 
deviates from his true score because of compensating errors. Thus de- 
scriptions of the variability prevailing in this theoretical distribution of 
deviations represent one means of estimating reliability. The standard 
error of a score is such an estimate. This statistical measure is interpreted 
exactly the same as any other standard error and fiducial limits for the 
true score can be set whenever desired. 

This measure is likewise subject to all the assumptions of the coefficient 
of reliability and the index of reliability except that it does have the 
disadvantage that it has no meaning to anyone not familiar enough with 
a given test to know, in general, the size of scores which will be obtained 
when this test is administered to a group of subjects. 


332. STATISTICAL METHODS 


COMPUTING TEST RELIABILITY 


Computing a coefficient of correlation between the scores made in a 
single group on parallel forms of a test directly yields the coefficient of 
reliability. When only a single test is available and the coefficient of reli- 
ability is desired, one approach to estimating the reliability coefficient 
in the circumstance being considered involves dividing the test into two 
similar subtests—generally a test of odd-numbered items and a test of 
even-numbered items. The coefficient of correlation between the scores 
on the odd items and the even items yields the coefficient of reliability of 
a test of one-half the length of the original test. An estimate is then made 
of the expected coefficient of reliability of a test, had it been twice as long 
as either of the half-tests. This estimate is reported as the coefficient of 
reliability and is obtained by the Spearman-Brown modified formula 


qir që Dine 
= Lë 


where 
Tzz = the coefficient of reliability of the test 
Toe = the coefficient of correlation between odd and even items. 


Thus, if a coefficient of correlation of 0.60 is found between the odd and 
even forms of a test originally 50 items in length (25 odd and 25 even 
items) the coefficient of reliability is reported for the 50-item test to be 
0.75 as indicated by the following substitution: 


Ef 2(0.60) 
=  1-0.60 
Likewise the reliability of a test any number of times as long as the 


test forms used in the original computation of the coefficient of correla- 
tion may be found. In this instance the general formula is 


Nrzz 
1+ (N — Dra 


= 0.75 


TNzz = 
where 


TNzz = the reliability coefficient of a test after lengthening 
N = the number of times the test is to be lengthened 
Tzz = the coefficient of correlation between forms prior to lengthening 


This formula is the original Spearman-Brown “prophecy formula.” If an 
estimate of the reliability coefficient resulting from lengthening the fore- 
going test to 200 items is desired, substitution into the general formula 
would yield 
` 8(0.60 A 

IF — 10.60 ~ 9-9 


TNaz 


STATISTICAL TECHNIQUES IN MEASUREMENT 333 


An estimation of the reliability coefficient of a test form consisting of 
10 items would be 


bra 0.4(0.60) H 
== I + (0.4 — 1)0.60 
The Spearman-Brown formula can be used to determine the number of 
times a test must be lengthened to reach a given reliability coefficient. 
The general formula may be rewritten for the solution of N with a postu- 
lated reliability as follows: 


0.375 


_ Tall — Tes) 
ie zl — Tiz) 


where 
N = the number of times a test is to be lengthened 
r-z = the reliability of the test prior to lengthening 
ri, = the postulated reliability coefficient 


If an estimate of the number of times the foregoing test which had a 
reliability of 0.75 for 50 items would need to be lengthened to reach 
a postulated reliability coefficient of 0.90 were desired, substitution into 
the foregoing formula would yield 


_ 0.90(1 — 0.75) _ 
N = 0.750 — 0.90) ~ + 
Thus, three times fifty, or 150, items would be required for the postulated 
reliability coefficient of 0.90. 

The major assumption underlying the application of the Spearman- 
Brown prophecy formula is that the items concerned in the lengthening 
or shortening of the test are homogeneous with respect to those of the 
original test. When the assumption of homogeneity is not met, the Spear- 
man-Brown formula will yield an underestimate of reliability upon ex- 
pansion of the test and an overestimate upon contraction of the test. 

For a test in which each subject’s score depends solely upon the oppor- 
tunity he has to attempt items, the use of odd- and even-numbered items 
in estimating reliability is open to serious question. In this instance the 
compensating errors would be composed largely of errors in recording the 
responses. Presumably each subject would have successes on all items 
which he attempted. The Spearman-Brown formula is therefore not ap- 
propriately applied to speed-test data of this type. 

It can readily be noted that there are many ways of splitting a test 
into halves, or of making parallel forms from a pool of items. Arbi- 
trarily constructing two forms yields only two of many possible forms. 
The use of the split-half method as well as the parallel forms method has 
been criticized for this reason, since it is feasibly impossible to construct 
all possible forms of a test and estimate the reliability for all possible 


334 STATISTICAL METHODS 


combinations of two forms. For these reasons other methods of estimating 
reliability coefficients from one test administration have been developed. 
Kuder and Richardson have proposed several formulas for estimating 
reliability coefficients based upon item-total score interrelationships. Of 
the various formulas which have been developed by these individuals, the 
method known as rational equivalence is perhaps the most widely used. 
The formula is 


me (NER) 
n—l Pra 
where 
n = the number of items in the test 
p = the proportion of subjects responding correctly to an item 
q =p 
oí = the variance of the total test scores 
The assumptions underlying the method of rational equivalence are 
the same as those underlying the correlation of odd and even items and 
the subsequent application of the Spearman-Brown formula. To the ex- 
tent that the items of the test measure the same function, these two 
methods will yield similar estimates of the reliability of a test. 
The standard error of a test score, or standard error of measurement 
as it is sometimes called, is found from the following formula 


Oscore = oV1 — Tzs 
where 


score = standard error of measurement of a test (or standard error of a 
score) 
c: = standard deviation of test scores in the group in which the coeffi- 
cient of reliability has been computed 
Tee = the coefficient of reliability of the test 


Thus, the standard error of a test score is found from an obtained coeffi- 
cient of reliability of 0.75 in a group of subjects having a standard devia- 
tion of test scores of 20 by the following substitution 


Cecore = 20V 1 — 0.75 = 10 


In a large group of subjects the true score of 95 out of every 100 subjects 
would be within their obtained score plus or minus 1.96 times this value. 


COMPUTING TEST VALIDITY 


The most frequently reported measure of the validity of a test in a 
given circumstance is the coefficient of correlation between test scores 
and measures of a selected criterion. Unless the criterion measures are 
true scores, such a correlation is more appropriately referred to as a 
measure of predictive effectiveness than a coefficient of validity. It can 


STATISTICAL TECHNIQUES IN MEASUREMENT 335 


be readily seen that many such correlations can be, and frequently are, 
reported for a given test, since different criteria may be selected, and 
different groups of subjects may be used in evaluation of the instrument. 
For example, a college aptitude test was reported to have a coefficient 
of correlation between test scores and first semester grade-point averages 
of 0.52 for engincering freshmen and 0.44 for home-economics freshmen. 

In instances where the selected criterion represents a numerical vari- 
able, the zero-order coefficient of correlation is used to determine pre- 
dictive effectiveness. Occasionally, however, the criterion cannot be ac- 
curately measured. When this situation prevails, some other measure of 
correlation is used. For example, the criterion may be a forced dichot- 
omy, such as tendency to receive a graduate degree. Thus, a biserial 
correlation coefficient of 0.43 was found between scores on a graduate ap- 
titude test and the tendency to obtain one or more advanced degrees after 
having once matriculated as a graduate student. The correlation tech- 
nique used in determining the predictive effectiveness of a test will de- 
pend upon the way in which the criterion is defined and is measured. 
The conditions under which all measures of predictive effectiveness were 
obtained, the technique of computation used, and the characteristics of 
the subjects should be carefully described. 

Measures of the predictive effectiveness of a test are influenced both 
by the unreliability of the test and the unreliability of the criterion. 
Closely related to the unreliability of the test is its length. Increasing 
the length of the test should increase the reported predictive effectiveness 
since less deviation from each individual's true score is reflected. In- 
creases in test length by the addition of homogeneous items will result 
in small increases in prediction effectiveness. If the added items contrib- 
ute to the measurement of aspects of the criterion not previously meas- 
ured, the predictive effectiveness will be considerably increased. 

An estimate of the coefficient of correlation between score on a test 
and a criterion to be expected from lengthening a test with known reli- 
ability can be obtained from the formula: 


a 
Try 7 
1 — Ter 
E e 
N + 
where 
= estimated predictive effectiveness which will result from lengthen- 

"2 Ls a test N times with a reliability coefficient of Tez 
Ter = reliability coefficient of the test 
try = obtained predictive effectiveness between the test and the criterion 


Tt is assumed in the development of the formula that the items added will 


be homogeneous items. i l 
Thus, if a test had a reliability coefficient of 0.85 and a correlation 


336 STATISTICAL METHODS 


with the criterion of 0.40, tripling the length of the test should yield a 
correlation coefficient of 0.42, as the following solution indicates: 


ane SC 0.42 
y ES + 0.85 


The foregoing equation can be solved for N to indicate how many times 


a test must be lengthened to yield any postulated correlation coefficient. 
Thus, the formula becomes 


N =r 


zy 


— Tez 
Nzy 


TË an estimate of the ultimate limit of the predictive effectiveness ob- 
tained if the test were made perfectly reliable is desired, the following 
formula can be used: 


r 
r. = HL 
te) Vra 
where 
Tz(y) = estimated validity coefficient from a test with perfect reliability 


Likewise, for a criterion with postulated perfect reliability the formula is 


= ta 
Tat Ving 
where 
Tz) = estimated validity coefficient 
Tu = reliability of the criterion 
Postulating perfect reliability for the measurement of a criterion yields a 
concept which is meaningful and which corresponds to the definition of 
test validity. Frequently, however, the reliability of the criterion is diffi- 


cult to determine, For example, what is the reliability of course marks 
assigned by a given teacher? 


In the foregoing paragraph the relation between either a perfectly reli- 
able test and an unreliable criterion or vice versa was discussed. It is 


use of the correction for attenuation. 
Several formulas for corr 


uating or reducing effects of compensating errors in two series of meas- 


where ` 


STATISTICAL TECHNIQUES IN MEASUREMENT 337 


Ta = estimated correlation coefficient for two perfectly reliable 
series of measurements, i.e., tests of infinite length 

Thus, if the correlation between a test and a criterion were 0.40, and 
the reliabilities of the test and criterion were 0.70 and 0.90, respectively, 
the correlation coefficient corrected for attenuation would be 0.50. It 
should be emphasized that a correlation coefficient corrected for attenua- 
tion is only a condition postulated through mathematical theory and 
cannot be obtained under practical circumstances. Since the correction 
yields values higher than the obtained correlation, extreme care must be 
exercised to identify the corrected value. 

Frequently the correction for attenuation procedure just described will 
yield values larger than unity. Obviously this situation has no meaning. 
‘A formula for correction for attenuation which yields more stable esti- 
mates than other formulas of this type in that the estimated correlations 
less frequently are greater than unity is 

r= VI- SL SÉ 
where 
fa = correlation coefficient corrected for attenuation 
average of the intraform correlation coefficients (within tests) 
average of the interform correlation coefficients (among tests) 


fo = 
Fa = 
where the average correlations have been obtained from the function 
Let 
=r 


and 


logs 


Let it be supposed that two tests, A and B, have been administered to 
a group of subjects. Each test has two forms, 1 and 1. The inter- and 


intraform correlations are shown in Table 129. 


Tase 129. Inter- and Intra-Test Coefficients of Correlation 


FORM Al AI Bl BI 
CEN eS ee 
Al Saz 0.653 0.517 
AI 0.777 0.700 0.498 

BI 0.3% 


s are averaged they are found to be 
Ta = 0.599 
To = 0.668 


When these value: 


Substituting these values into the formula it becomes: 


ra = VI — (0.668)? + (0.599)? = 0.955 


338 STATISTICAL METHODS 


Thus, the conclusion would be reached that these two tests, when cor- 
rected mathematically for the unreliability in each, have a high inherent 
relationship as evidenced by the corrected correlation of only slightly 
less than unity. 


ITEM ANALYSIS 


Item analysis is a process of evaluating the effectiveness of individual 
test items by determining the relationship of the item responses of a 
designated group to a criterion. The designated group may be those 
individuals involved in the standardization of a test, it may be a group 
known to possess some defined behavior characteristics or experience, or 
it may be composed of individuals to whom the item has been adminis- 
tered in preliminary or experimental form. The criterion used may be an 
internal one, i.e., dependent upon the remainder of the items in the test, 
such as total test score, or an external one, such as behavior or charac- 
teristics of the subjects entirely independent of the test items being 
evaluated. Another criterion frequently used is the difficulty of the item 
as indicated by the proportion of the group responding to it in a given 
way. The criterion of difficulty obviously involves only responses to the 
item being evaluated. In the construction of a test or the conduct of an 
experiment several item analyses are often made. 

Since item analysis yields information concerning each item individu- 
ally, there are many aspects of test construction to which this process is 
of value. For example, item analysis yields evidence regarding the effec- 
tiveness of distractors in multiple-choice items. Whenever all subjects in 
a representative standardization group ignore possible responses, this sug- 
gests that the item should be reworded. If the order of difficulty of a 
group of items is desired, this evidence can be had by noting the propor- 
tion of the group responding to each item correctly. Item analyses are 
frequently made to determine changes in behavior as a result of some 
educational experience. Such an evaluation involves comparing the re- 
sponses to an item when it is used in both pre- and post-tests. For any 
single test administration, item analysis may be used to diagnose learning 
difficulties as well as to obtain suggestions for teaching. On some occa- 
sions, item analysis is used in the construction of scoring keys by identi- 
fying a group possessing some behavior characteristic and comparing their 
responses with those of another group. The foregoing procedure also yields 


evidence which can be used to determine scoring weights to assign to 
various responses, 


e 


STATISTICAL TECHNIQUES IN MEASUREMENT 339 


requires machine-scored answer sheets. Occasionally punched cards of 
the IBM type can be used. If such services are not available, the 
tabulation can be made manually. 

Many methodological approaches to item analysis have been devel- . 
oped. The amount of refinement in the item analysis technique chosen 
for use in any particular circumstance xvill, of course, depend upon the 
purpose for which the results will be used. In general, the final results 
yielded by different statistical methods of item analysis are considerably 
more similar than the results obtained by using different criteria to 
evaluate items when the same methodological procedure is employed. 

One of the most frequently used statistical procedures for item analysis 
is that of obtaining the correlation between item response and total test 
score. If the item response is a dichotomy, biserial correlation is used. If 
the item response is numerically evaluated in several categories, Pearson 
product-moment correlation is often used, although the use of triserial, 
quadriserial, quintiserial correlation, no doubt, would be more appropri- 
ate. When the same total score is used as the criterion for item selection, 
spuriousness is introduced. The spuriousness of each item correlation will 
depend on the number of items in the test and their difficulty, since the 
response to the item concerned has also been included in the total score. 
Since the spuriousness affects each item, however, its elimination is prob- 
ably not worth the effort of rescoring each test as many times as there 
are items so as to remove the self-correlation. 

Analyzing items by correlating the responses to each item with the 
total test score assumes that the total score is an appropriate index of 
the behavior which the test has been designed to measure. Thus selecting 
items for a test which correlate high with their total score tends to yield 
items which correlate high with each other. It can be seen, then, that it 
is only appropriate to use a total score as a criterion when the behavior 
measured by the total score is homogeneous. If the total score represents 
heterogeneous behavior through containing items measuring several dif- 
ferent behaviors, those behaviors which are represented by the fewest 
items in the test will be those most likely to be eliminated simply be- 
cause there may not have been enough of them to contribute heavily to 
the total score. This strongly suggests that item analyses on the basis 
of total score should be made separately for subtest scores if items can 
logically be classified into subtests, or only vyhen total criterion scores 
represent reasonably homogencous behavior. 

Measures of some criterion behavior external to the test may also be 
used to evaluate items. For example, course marks which students make 
might be used as a criterion for selecting items in a scholastic aptitude 
test. When this procedure is followed the homogeneity of the final test 
will be determined by the homogeneity of the criterion. 

The way in which the criterion has been measured and the manner of 


340 STATISTICAL METHODS 


responding to the item will determine the statistical technique used for 
the item analysis. When the criterion is a dichotomized variable, for ex- 
ample, attrition-survival in education, and the item response is also 
dichotomized, either tetrachoric correlation or the phi coefficient can be 
used to indicate the effectiveness of individual items, the responses to 
which are also dichotomies. If the criterion is evaluated numerically, and 
the item responses have been dichotomized, biserial correlation is appro- 
priate. When both the criterion and the responses are numerically evalu- 
ated in several categories, other serial correlation or the product-moment 
correlation is used. 

If correlational procedures are followed in item analysis, responses of 
all subjects are used and the resulting coefficients can be tested for their 
significance from zero. In most instances the test constructor is more 
interested in the relative value of available items than in significance. 
To reduce the labor required in item analysis when this situation prevails, 
several tables yielding estimates of the correlation between item responses 
and numerical and dichotomous criteria have been developed. Most of 
the tables proposed make use of the responses of subjects falling into 
only the upper and lower extremes of the criterion distribution; hence, 
allowing the test constructor to eliminate the middle portion of the dis- 
tribution. Such a table developed by Flanagan! yields an estimate of the 
correlation between item responses and a criterion based upon the pro- 
portion of success responses to an item of the upper and lower 27 per 
cent of the criterion distribution. A table developed by Davis? yields an 
indication of the effectiveness of items based upon the discriminating 
power of an item and its difficulty. An abac developed by Guilford 3 
greatly lessens the computation required for determining phi coefficients, 
If the responses of all subjects are used, values yielded by this procedure 
can be tested for significance. 

The values yielded by most tables designed to assist test constructors 
with item analyses cannot be tested by the usual tests of significance, 
This is especially true whenever the criterion distribution is mutilated 
by ignoring the responses of some subjects. The tremendous savings in 
labor through the use of such tables, however, ordinarily offsets the dis- 
advantages of using such estimated values. 

Another statistical techni 
criterion is numerically ey 
between mean criterion se 


que which is used in item analysis vyhen the 
aluated is that of comparing the difference 
ores of groups responding differently to the 


“J. C. Flanagan, “General Considerations i i 
PR e, SE Së e e the Selection of Test Items,” Journal 


2y i s 
F. B. Davis, Item-Analysis Data (Cambridge, Harvard University Press, 1946). 


2J. P. Guilford, Fundamental Statistics i e 
McGraw-Hill Book Co, Inc, 1950), p 08, "rhel and Education (New York, 


_ ëmge TI. cc 


STATISTICAL TECHNIQUES IN MEASUREMENT 341 


item. Tests of significance can be applied to this procedure when all 
subjects are used. 

When the criterion is not a numerically measured variable, as is some- 
times the case, chi square is used in item analysis. For example, item 
response might be compared with sex or with rural-urban residence of 
subjects or with different geographic locations of subjects. When chi 
square is used in item analysis the probability level of the significance 
of chi square is perhaps more meaningful to interpret than the chi-square 
value itself. 

Except for preliminary work in selecting items for a test, it is difficult 
to justify selection on the basis of tests of significance since such tests 
are dependent upon the number of cases involved. Many more items will 
be significant with 1000 subjects than with 100 subjects. The test con- 
structor usually follows the practice of deleting items on the basis of 
their relationship with the criterion until the test is of the necessary 
length to agree with the amount of time feasible for its administration. 
_ Other than for the most exploratory purposes, no attempt to make an 
item analysis should be made with a small number of subjects. Gener- 
ally a test constructor is satisfied with a test yielding a reliability of 
0.90 with 100 items. The reliability of single items is obviously extremely 
low. Although not specifically designed for this purpose, evidence con- 
cerning the average single-item reliability of a test can be obtained by 
applying the Spearman-Brown formula to the total test reliability. When 
this formula is employed for estimating the average item reliability, a 
value of 0.08 is found for any given item in a 100-item test with a total 
test reliability of 0.90. Item analysis on the basis of scores from less than 
one or two hundred cases provides little evidence of relative effectiveness 
of items. The requirement of a large sample is equally pertinent with 
any technique of item analysis. 

In summary, item analyses yield evidence regarding the effectiveness 
of individual test items. Item difficulty and item discrimination on the 
basis of internal consistency and external criteria are the usual standards 
with which items are evaluated. The purpose of the analysis, the way 
in which subjects respond to the items, the way in which the criterion 
is measured, and the degree of precision desired determine the particular 
statistical technique upon which the item analysis is based. 


Exercises 
1. The coefficient of correlation between the odd and even items of a 100-item 


vas found to be 0.815. Zë 1 
T tat is the Spearman-Brown estimate of the reliability of this test? 


b. What would have been the reliability of the test if it had contained only 
50 items? 


342 STATISTICAL METHODS 


c. What would the reliability of this test have been if it had contained 
150 items? 

d. What assumption regarding the items added to and subtracted from the 
100-item test is made in answering the foregoing questions? 

e. How many additional items would be required to raise the reliability of 
the 100-item test to 0.960? 

2. If the standard deviation of a distribution of test scores is 25 and the re- 
liability of the test with these subjects is .90, what is the standard error of a 
test score in this distribution? What is the interpretation of this value? 

3. a. Can a test be perfectly reliable without being perfectly valid? Why? 

b. Can a test be perfectly valid without being perfectly reliable? Why? 

4. What is the estimated ultimate limit of the predictive effectiveness of a 
test for an external criterion with which it now correlates 0.55 and has a reli- 
ability coefficient of 0.85? 

5. The following interform correlations were obtained between two areas of a 


test. The two forms were made for each area by scoring the odd and even items 
in that area. 


Arca S Area TI 
Odd Even 
Odd 0.939 0.926 
I 
Even 0.932 0.926 


The intraform correlation for Area I was 0.815 and for Area II was 0.979. 


a. What is the correlation between these two areas corrected for attenua- 
tion? 


b. How would you interpret this value? 


18 


Analysis of Covariance 
EE 


Although analysis of variance with single or multiple classification is 
considered a valuable tool in educational research, many problems are 
encountered which cannot be adequately treated by this statistical tech- 
nique. For example, if groups are to be compared on the basis of their 
response to a criterion, and if individual differences among the members 
within the groups are either known to influence the eriterion or suspected 
of such influence, an attempt must be made to control these individual 
differences. Such differences might be identified by stratification of the 
groups or by measurement of the individual characteristics by means of 
evaluation instruments if available. Whereas analysis of variance can 
be applied to data which have been stratified, it cannot simultaneously 
include other measurements in the tests of significance. 

It is apparent that if unavoidable influences on the criterion are not 
controlled, the presence or absence of differences among groups being 
compared on the basis of the criterion cannot be specifically attributed 
to the treatments being tested. To provide the investigator with a means 
of attaining a measure of control of individual differences, the statistical 
technique known as analysis of covariance was developed. Analysis of 
covariance incorporates elements of the analysis of variance and of re- 
gression. In general, it will provide tests of significance for the compari- 
son groups whose members may have been stratified and whose members 
have been measured with regard to one or more variable characteristics 
other than the criterion. 

The applications of the analysis of covariance are numerous. When 
testing hypothesis pertaining to the differences in academic achievement 
this technique is frequently used. Individual differences in ability and 
aptitude known to exist among students are frequently embodied in such 
research problems and must be considered in the treatment of the data. 
In the past such differences, on occasion, have been completely ignored 
or have been controlled by pairing on the basis of scores representing the 
differences to be considered. However, precise pairing is often difficult to 

343 


vi 


344 STATISTICAL METHODS 


obtain, and the more effective analysis of covariance method of controlling 
is used. 


SINGLE CLASSIFICATION 


The mathematics of single classification analysis of covariance are 
simple, though lengthy, and can be readily explained by means of a 
sample problem. An investigator wished to determine whether college 
“freshmen who had received credit for one year of high school chemistry 
differed in their beginning general chemistry course achievement from 
those who did not receive credit in high school chemistry. 

As a criterion the first quarter final marks in the beginning college 
chemistry course were used. Any student who began the course and later 
dropped it, or transferred to a decelerated chemistry course, was consid- 
ered to have failed the course. 

Since the scholastic aptitude and academic ability could conceivably 


TABLE 130. Sums and Means of the Criterion and Control Variables 
for Beginning College Chemistry Students 


ACE 
FINAL MARK HIGH PSYCHOLOGICAL 
IN COLLEGE SCHOOL EXAMINATION 
RECEIVED CREDIT 
EE NUMBER CHEMISTRY AVERAGE = SCORES dë 
CHEMISTRY k ZY Y EX, xX, EX: Xa 
Yes 228 488 2.14 677.83 297 27,651 121.28 
No 166 138 0.83 483.20 2.91 18,684 112.55 
Total 394 626 1.59 1,161.03 2.95 46,335 117.60 


influence each student’s response to the criterion, these individual differ- 
ences were controlled by obtaining the American Council on Education 
i Psychological Examination scores as a measure of scholastic aptitude and 
¡the high school grade-point averages as a measure of academic ability for 
each student in the sample. By using these scores as control variables in 
the analysis of covariance, the possible bias introduced by individual dif- 
ferences will be removed in so far as these factors adequately represent 
the differences in question. A sample of 394 freshman students was in- 
cluded in the study. The sums and means of the criterion and control 
variables are shown in Table 130. 

In addition, the sums of squares and sums of all possible crossproducts 
in raw score form are necessary for the computation and are shown in 
Table 131. It should be noted that these values are found for the entire 
sample and not for either of the two subgroups individually. 

The values in Table 130 and Table 131 are now used to compute the 
sums of squares and the sums of crossproducts in deviation form for the 

` 


ANALYSIS OF COVARIANCE 345 


Taste 131. Summary of Experimental Data for Beginning 
College Chemistry Students 


SCORES SYMBOLS TOTAL FOR ENTIRE SAMPLE 
Final Marks in Beginning Chemistry zY? 1,628 
High School Averages Ee 3,529.5382 
ACE Psychological Examination Scores 2X3 5,600,809 
Crossproducts ZXY 1,970.10 
Eé 77,896 
2X% 138,659.82 


total sample and for within the subgroups. In the case of the final chem- 
istry marks, the calculations of the deviation values are as follows: 
For total sample: 


2 2 
zë = IY? > eo = 1628 — Cor = 633.3909 


For within the subgroups: 


2 2 2 
2 = 2Y? — E D cra = 1628 — i + cy) = 468.7860 
In a similar fashion the sums of squares of the deviations away from the 
mean are calculated for the high school grade-point averages (X1) and 
the ACE Psychological Examination scores (X2). 
With the data from Table 130 and Table 131 the deviation form of the 
chemistry marks and high school grade-point 


crossproduct between final 
averages is determined as follows: 
For total sample: 


Xy) (2Y 1161.03)(626) _ 
sng = ap ED 1 9010 Te = 195-4172 


For within the subgroups: 


EX) EY) EX) OY») 

DA =[ ky + ka | 

(677.83) (488) + (483.20) 088) = 117.6008 
228 66 


The two remaining crossproducts, Lë: XY and XiX2, can be calculated 
er. A suggested method of organiza- 


in deviation form in a similar manni 
tion of the deviation values of both sums of squares and sums of cross- 
products is illustrated in Table 132. d : 

Now the point has been reached at which the regression equations for 
total and within the subgroups can be calculated. Since the nature of all 
the relationships is assumed to be linear and the values of the criterion, 


Zany = 


= 1970.10 — [ 


346 STATISTICAL METHODS 


TABLE 132. Sums of Squares and Crossproducts in Deviation 
Form for Both Subgroups 


SOURCE OF 
VARIATION zë za za Zen Za Zeus 
Total 633.3909 108.2421 151,742.4391 125.4172 4,277.4467 2,120.9239 
Within 


Subgroups 468.7860 107.8718 144,434.5982 117.6096 3,180.6759 2,068.8973 


Y, will be predicted from known values of the two control variables, X, 
and Xe, the general equation is 
Y = 4X1+ aX: +C 
Because the data have been converted to deviation form the C-value 
drops and the general eauation becomes: 
Y = diti + Agr. 
The two normal equations necessary for the solution of each regression 


equation are: 
Lay = aa? + Att 


Lay = Ett + Xr? 


In the case of the normal equations for the total regression equation, 
the values in Table 132 are substituted as follows: 


125.4172 = 108.2421aj +  2,120.9239a» 
4,277.4467 = 2,120.9239a; + 151,742.4391a- 
Simultaneous solution of the normal equations yields values of aj and az 
which are substituted in the regression equation so that 
y = 0.83502669x, + 0.016517585-2 
This equation provides the necessary elements for the computation of 
the sum of squares of residuals for total. As described in the chapter con- 
cerning regression, the sum of squares of residuals is equal to: 
Zy — Ion + gäre) 
Therefore, upon substituting the appropriate values from Table 132 and 
the aj and ag values of the total regression equation, 
S.S. of residuals for total = 633.3909 — [(0.83502669) (125.4172) 
+ (0.016517585) (4277.4467)] = 458.0111 


By the same procedure, the within subgroups deviation values in Table 
132 are substituted into the same normal equations and the within regres- 
sion computed. This equation is f 


y = 0.920913822, + 0.0088302930x- 


ANALYSIS OF COVARIANCE 347 


It should be noted that the sizes of the coefficients of xı and Tə are ap- 
proximately the same as those of the corresponding coefficients of the 
regression equation for the total sample. This condition invariably exists. 

The values of a; and az from the within regression equation can be com- 
bined with the appropriate values in Table 132 to calculate the sum of 
squares of residuals for within subgroups. 


S.S. of residuals for within = 3y? — (ong + Gay) = 468.7860 
—[(0.92091382) (117.6096) + (0.0088302930) (3180.6759)] = 332.3914 


A test of significance can now be made of the null hypothesis that 
students who received credit in high school chemistry do not differ in 
achievement in first-quarter beginning college chemistry from those stu- 
dents who have not received credit in high school chemistry. The analysis 
of covariance is shown in Table 133. 


TABLE 133. Test of Significance of Influence of High School 
Chemistry on Achievement in Beginning College Chemistry 


an 


RESIDUALS 
SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Total 391 458.0111 
Within Subgroups 390 332.3914 0.8523 
A A e Q__— 
Difference 1 125.6197 125.6197 


125.6197 _ 147.39 t= VË = V147.39 = 12.14 
0.8523 


7 
Fy,390 = 


The difference, 125.6197, is that part of the sum of squares of residuals 
for the total sample which is due to the high school chemistry background. 

ean squares are found by dividing the sums of squares by the appropri- 
ate degrees of freedom and the F-value is computed in the same manner 
AS In the analysis of variance. 

V The determination of the number of degrees of freedom for the within 
Subgroups deserves a word of explanation. The number of degrees of 
freedom in the entire sample is one less than the number of cases, or 393. 

owever, cach of the control factors, Xi and Ya, accounts for one degree 
of freedom, or a total of two, whereas the high school chemistry back- 
ground relationship also removes a single degree. Therefore, three degrees 
of freedom can be designated to specific sources, and the number of de- 
grees of freedom for within subgroups is 390. 

The F-value of 147.39 with 1 and 390 degrees of freedom is significant 
beyond the 5 per cent level of confidence. Therefore, when the criterion 
Means of the two subgroups are adjusted for individual differences in 
Scholastic aptitude and academic ability, the difference between the first- 


348 STATISTICAL METHODS + 


quarter mean achievement of students who had received credit in high 
school chemistry and the first-quarter mean achievement of those who had 
not received such credit, is so large that it undoubtedly was not caused 
by a sampling accident. Presumably the difference can be attributed to 
the presence or absence of high school chemistry in the secondary school 
preparation. 

To adjust the criterion means of the subgroups, the within regression 
equation is used. Inspection of the means of all variables as shown in 
Table 130 reveals that the students who had received credit in high school 

` chemistry surpassed those who had not with regard to the criterion and 
both control variables. Apparently a part of the difference between the 
eriterion means is caused by the fact that the students having received 
credit in high school chemistry are a superior group on the basis of the 
control variables. i 

To determine the size of the adjustment of each criterion mean, the 
differences between the subgroup means of the control variables and z 
the general mean: of the control variables are substituted in the within 
subgroups regression equation and the adjustment term is calculated. 
Note that the sign of the coefficients may be negative in some problems 
and the coefficients would, of course, be applied in accordance to their 

signs. 

For students who had received credit in high school chemistry: 
Adjustment of Y, = (Xi, — Ende + (Xo, — XaJas 

= KN — 2.95) (0.9209) + (121.28 — 117.60)(0.008830) 
= 0.05 


'This value is subtracted from 2.14 because of this subgroup’s superiority 
over the other. The adjusted criterion mean is then 2.09. 
For students who had not received credit in high school chemistry: 


Adjustment of Y, = (X1, — Xia: + (Xo, — Kl 


(2.91 — 2.95)(0.9209) + (112.55 — 117.60)(0.008830) 
= —0.08 


This value is added to 0.83 because those students who had not received 
credit in high school chemistry were penalized by lower scholastic apti- 
tude and ability. The adjusted criterion mean is then 0.91. 

It should be noted that the decision of whether to add or subtract the 
adjustment term from the criterion mean depends upon the particular 
subgroup being considered. It should be further noted that this adjust- 
ment process is only in order when significant F-values have been found 
in the analysis of covariance, 

A comparison of the adjusted criterion means reveals that those stu- 
dents who had received credit in high school chemistry are favored. Hence 
it can be concluded that in so far as scholastic De is controlled by 


ANALYSIS OF COVARIANCE 349 


ACE Psychological Examination raw scores, academic ability is con- 
trolled by high school grade-point averages, and no other pertinent fac- 
tors related to college chemistry achievement contribute a bias, evidence 
has been found that students who had received credit in high school chem- 
istry surpassed students who had not in achievement in the first quarter 
of beginning college chemistry. 

Y Analysis of covariance with single classification need not be restricted 
to two subgroups as in the foregoing example. As in the case of analysis 
of variance any number of subgroups can be included in the computation. 
For example, a research worker wants to determine whether elementary 
school pupils enrolled in three different types of schools differ in arithme- 
tic achievement as measured by the Stanford Achievement Test. The three 
types of schools include the one-room rural school in which one teacher 
instructs eight grades; the combination school in which more than one 
teacher has been employed, but not as many as one teacher for each 


TABLE 134. Sums and Means of the Criterion. and Control Variable 
of Elementary School Pupils 


STANFORD ACHIEVEMENT TEST INTELLIGENCE 

TYPE OF NUMBER TOTAL ARITHMETIC SCORE AS 
SCHOOL ATTENDED k SH NG ox Y 

Urban 80 4,624 57.80 7,746 96.83 

Combination 80 3,801 47.51 7,023 87.79 

One-Room Rural 80 4,035 5044 7457 Be ot 

, 
Total 240 12,460 51.92 22226 92.61 


grade; and the urban school in which there is a least one teacher for each 
grade, y >, 

The“criterion designated as the total score made on the arithmetic sec- 
tion of the Stanford Achievement Test, this score consisting of the sum 
Of the scores made on the computational and reasoning subdivisions. Since 
differences in scholastic aptitude could influence a pupil's response to the 
Criterion, the Otis Quick-Scoring Mental Ability Test was adniinistered 
to all pupils and these scores are used as a contral variable. A sample of 
240 seventh-grade pupils was drawn and the sums and means of the cri- 
terion and control variable computed as shown in Table 134. 

Also necessary to the calculation are the sums of squares and sum of 
the crossproducts in raw score form. These values for the entire sample 
are shown in Table 135. 

“ The sums of squares and the sum of the crossproduct in deviation form 
are computed from the values in Tables 134 and 135. As in the case of 

e foregoing problems, these values must be found for both the total 


sample and for within the three subgroups. 


350 STATISTICAL METHODS 


TABLE 135. Summary of Experimental Data 
for Seventh-Grade Pupils 


——— ————— 


TOTAL FOR 
SCORES SYMBOLS ENTIRE SAMPLE 
Arithmetic Achievement Scores zY? 672,112 
Intelligence Quotients zX? 2,096,948 
ZEN 1,166,054 


[a sh EE 


For the total sample: 
2 2 
zy? = ZY? — Ez = 672,112 — BE — 25,230.33 


For within subgroups: 


di QY)”, EYJ er 
g- -| k "SZ ko 


2 2 d 2 
= 672,112 — ES at oe = Go = 20,734.48 


The deviation values of the control variable, intelligence quotients, are 
computed in the same manner. | 

In the case of the erossproduct: 

For the total sample: 


Sen = ZXY — SH = 1,166,054 — 2 ee = 12,154.17 


For within subgroups: 


Sne Sp A jezoero A GA e Ge ell 


u 


1,166,054 — [unas db "an i | 
= 8542.48 
All deviation values are shown in Table 136. 


TABLE 136. Sums of Squares and Crossproduct in Deviation Form 
for Three Subgroups 


SOURCE OF VARIATION Ey? Za Iry 
Total 25,230.33 38,635,184 12,154.17 


Within Subgroups -20,734.48 35,324.325 8,542.48 


Once again regression equations for the total sample and for within 
the subgroups can be found. Since only one control variable was included 
in the problem, the equation in deviation form is: 


ANALYSIS OF COVARIANCE 351 


yzar 


The value of a is computed from the normal equation 


xy 


Zu = ara? or a= 
Dry Y; Sa? 


As was demonstrated in the linear regression chapter, when only one 
X-variable is involved in the analysis, the value of a need not be calcu- 
lated as such in order to solve for the sum of squares due to residuals. 
Instead, 

(Say)? 


S.S. of residuals = 2? — “5,2 


Therefore, upon substitution of the values in Table 136, 


. d 12,154.17)? 
S.S. of residuals for total = 25,230.33 — ae = 21,406.772 | 


gjër: dëi 4 _ (8,542.48)? | 
S. of residuals for within subgroups = 20,734.48 35,324.325 \ 


18,668.652 


ll 


The sum of squares of residuals are entered in Table 137 and the null 
hypothesis that seventh-grade pupils enrolled in the three different types 
of schools do not differ in their arithmetic achievement can be tested. The 
analysis is computed in the usual manner. The F-value of 17.31 with 2 


TABLE 137. Analysis of Covariance of Arithmetic Scores 


of Seventh-Grade Pupils 
RESIDUALS 
SOURCE OF DEGREES OF SUM OF MEAN 
VARIATION FREEDOM SQUARES SQUARE 
Total 238 21,406.72 
Within Subgroups 236 18,668.652 79.104 
d Difference 2 2,738,120 1,369.060 
1,369.060 
A 17:81 
Fase = “79 104 


+ And 236 degrees of freedom is significant beyond the 5 per cent level. 
Therefore, to the degree that scholastic aptitude is controlled by intelli- 
Sence quotients, and to the degree that all other pertinent factors related 

9 achievement in arithmetic have not introduced a bias in this study, 
there is little doubt that the pupils enrolled in the three types of schools 


Be in arithmetic achievement. ie j 
Since a significant F-value has been found it is again appropriate to 


352 STATISTICAL METHODS 


compute the adjusted criterion means. The within subgroups regression 
coefficient is found from the formula 


ga Y SAB 09218 


Zz? 35,324.325 
Hence the within regression equation is 
y = 0.24182 
To find the adjustment terms of the criterion means, the means in Table 5 
* are substituted as follows: 
For urban school pupils: 


Adjustment of Y, = (Xu — Xa = (96.83 — 92.61)(0.2418) = 1.02 
For combination school pupils: 

Adjustment of Y. = (X. — X,)a = (87.79 — 92.61)(0.2418) = —1.17 
For one-room rural school pupils: 

Adjustment of Y, = (X, — X,)a = (93.21 — 92.61)(0.2418) = 0.15 


In the case of the urban school pupils and the one-room rural school pu- 
pils, the adjustments are subtracted from the criterion means, yielding 
56.78 and 50.29 respectively. The combination school pupils adjustment 
term is added, yielding 48.68. 

It should be noted that the analysis of covariance has demonstrated 
that the three adjusted means are different. Insufficient evidence has been 
found at this point to indicate that the difference between the members of 
any possible combination of two of the three means is significant. As in 
the case of analysis of variance the results of the analysis are indicative 
of areas where additional research may be desirable. 


MULTIPLE CLASSIFICATION 


On frequent occasions in educational research, meaningful stratifica- 
tion of data can be obtained with little effort and foresight. The invest- 
ment of such effort is extremely profitable in that more sensitive tests of 
significance are possible because of the additional controls, and greater 
amounts of information can be derived from a given set of data because 
more hypotheses are being tested. 

Just as analysis of variance can be expanded to include multiple clas- 
sification of the data, so can analysis of covariance be expanded. The 
calculations are patterned after the single classification procedure and 
deviate from it in only minor respects. 

Stratification of data for the purpose of removing a possible source of 
bias has repeated applications in research in the social sciences. For ex- 
ample, an instructor desires to compare two different methods of teaching 
beginning college chemistry to freshmen in terms of academic achieve- 
ment. As a criterion, the first-quarter final examination scores are se- 


Dez GTES'T 9643 FOES PeP Ga? GLOL OPT'9 08 PL 
827 0'£06 FILS KO 88'TIT GIE 08 216% OF 18109975 
0723 OV 802% IA 06'901 SET'Z 02:89 To 03 Astur yo [OOPS 

ët Ur POLO PAS 
ott JON SHUPMS 
263% 0'62p 034% Gë S8'9IT DZ OP'08 809'T 02 0) 
100725 gët Ut APOLO 
poarsooy, JULH SJU9PNIS 
g poy yg fq mëngt drop 
A A A A AAA EE 
18'E 9820 118% 60'STT OS LIT 889 0762 og OP reyoyqng 
999 0'ETS £86°% 29'89 or zt DEZ 09'94 Seet 02 ArystureyO 1000S 
Vë Ur pa) Daumen 
e JULH JON Syuopnys 
SU Co ZG Fo SE'ZIT LIBS 08'18 989T 02 Ks u 
- 100495 ër Ut pep 
peatoooy ure H syuepnys 
y popa Aq 16nv,y dno.p 
*Y tyz x XE e yz A AZ Y 
SAYODS NOLL SADVUIAV SHHODS NOILVNINVXA SAYHODS NOILVNINVXU UHANAN 
“VNINVXH "YNIA NOOHOS HDIH IVOIDOTOHOASE HOY TYNIA AULSINAHO 
VAGADTY ADATION DNINNIOHA 


i 


sdnoibqng noy 94) Lof sasoog 789,1, fo suvo qy pun suing "gel TIAVL, 


DË 


353 


354. STATISTICAL METHODS 


lected. To control on individual differences in aptitude and ability which 
may have an unbalanced influence on the mean criterion of the groups of 
students taught by the different methods, the ACE Pyschological Exami- 
nation raw scores are used as a scholastic aptitude control, the high school 
grade-point averages as a prior achievement control, and the final exami- 
nation scores in college algebra as a concurrent achievement control. 

Although these control variables may seem adequate, the investigator 
is aware that, as demonstrated in the first example in the chapter, credit 
in high school chemistry generally improves the student's achievement in 
the first quarter of beginning college chemistry. Since some of the chem- 
istry students included in the sample will have received credit in high 
school chemistry whereas others have not, failing to stratify on the basis 
of high school chemistry status might allow an unnecessary bias to remain 
in the study. 


TABLE 139. Summary of Experimental Data for Students Taught 
Beginning College Chemistry by Two Different Methods 


SCORES SYMBOL TOTAL OF BOTH GROUPS 
Beginning Chemistry Final Examination Scores zy? 495,292 
ACE Psychological Examination Scores =X} 1,082,055 
High School Averages ER 651.4826 
College Algebra Final Examination Scores 2X3 48,340.25 
Crossproducts EX,Y 714,564 
ZXY 17,557.56 
2X3Y 147,436 
2X,X2 25,997.51 
2X2X3 5,304.66 
2X1X3 216,802 


-—-——  ——- —— 98)  — ZO 


Stratification will also allow the testing of three hypotheses rather than 
one. The first and perhaps most interesting hypothesis is that students 
taught by Method A do not differ in chemistry achievement from students 
taught by Method B. Secondly, a test can be made to determine whether 
having or not having received credit in high school chemistry influences 
first-quarter beginning chemistry achievement, irrespective of the method 
by which it is taught. Thirdly, the null hypothesis that there is little dif- 
ference in first-quarter chemistry achievement between students having 
and not having received credit in high school chemistry when taught by 
Method A and when taught by Method B. The last two hypotheses, when 
tested, will help to support and enlarge the initial hypothesis concerning 
the effectiveness of the two methods of teaching. 

Stratification on the basis of high school chemistry was incorporated in 
the design of the experiment by subdividing the group being taught by 
each method into those having received credit and those not having re- 


ANALYSIS OF COVARIANCE 355 


ceived credit in high school chemistry. A sample of 80 freshmen was drawn 
so that 20 appeared in each of the four subgroups as shown in Table 138. 
Equal numbers of cases were used so as to avoid disproportionality.! 

The sums of the raw scores in Table 138 plus the sums of squares and 
all possible crossproducts in Table 139 provide the necessary information 
to compute the deviation forms of the sums of squares and crossproducts. 

Again it is necessary to calculate the deviation values for all sources 
of variation. In multiple classification it is possible to identify not only 
those values associated with the total sample and within the subgroups, 
but also those associated with the method and high school chemistry main 
effects and the interaction between the two. To illustrate in the case of 
beginning chemistry final examination scores, Y, 

For total sample: 


me = zyr — LP — 495,202 — DT = 24,047.0 


For method: 


3 — EY), EY” (y) _ (3168)? $ (2972)? (61407 480.2 
5 e A. 40 80 E 


For high school chemistry status: 
au = GES i DY) Ge GEI = (1636 + 1608)? Se E 1364) 
ky 40 


ka N 
2 
z "7 = 1513.8 


For interaction: 
nyt = GY a) | Yn) NA ED _ CIT 
koy kan koy bn 
— (S.S. for method + S.S. for high school chemistry) 
2 2 4)? 6140)? 
_ (1636)? , (1532)? , (1608) (1364) _ (61407 
a Fara” 2 80 
— (480.2 + 1513.8) = 245.0 


= 495,292 — 


* Adjustment of disproportionate su 
computation can be made but it is a 
Stration of the procedure has been developed by Tsao. 

Fei Tsao, “General Solution of the Analysis O 
me of Unequal and Disproportionate Numbers o 

Suchometrika, 11:107-128, 1946. 


356 STATISTICAL METHODS 


or, by subtraction, 

zy? = S.S. for total — (S.S. for method + S.S. for high school 

chemistry + S.S. for interaction) 
= 24,047.0 — (480.2 + 1513.8 + 245.0) = 21,808.0 

The sums of squares of the deviations away from the mean are computed 
in a similar manner for ACE scores, Xi, high school grade-point averages 
Xe, and algebra examination scores, Ka, 

The calculation of the crossproducts in deviation form is illustrated by 
that between the ACE scores and the criterion: 

For total sample: n 


Iry = EX, Y — "ol = 714,564 — Spani = 11,303.75 
For method: 


EX)EY), EXVEY) _ XDE) 
E da Te ee 
së (4688)(8168) A (4475)(2972) A (o163)(6140) Ze 


7 


For high school chemistry status: 


Sey = DENGE | XEY.) GAZ) 
ga 
ky kn N 
_ (2247 + 2337)(1636 + 1608) y (2441 + 2138)(1532 + 1364) 
E 40 


40 
E (preso (arto) mn” 


For interaction: 


zay BDT DEET, GET 
ay ‘an by 
a: Xn MET) = een 
bn 


— (S.S. for method + S.S. for high school chemistry) 
22: 1 1) (1532 
e ( SCH 636) i (244 2 53 A Esan C608) uf C188) 0364) 


= "ln — (621.85 + 21.75) = 687.75 


For within subgroups: 

Lay = ZXY 

A dl EX 1.) (EY an) y Pk) (274) Të ET) 
Key Kan key kon 

= 714,564 


> 


ANALYSIS OF COVARIANCE 357 
(2247)(1636) , (2441)(1532) , (2337)(1608) , (2188)(1364) 
xi a "o * 3... ] 


= 10,072.40 
or, by subtraction, 


ay = S.S. for total — (S.S. for method + S.S. for high school 
chemistry + S.S. for interaction) 


11,303.75 — (521.85 + 21.75 + 687.75) = 10,072.40 
All other deviation values of the crossproducts are computed by the same 
procedure. All deviation values are shown in Table 140. 

The sums of squares and sums of crossproducts for each main effect and 
the interaction are then separately added to the appropriate within sub- 
groups sum of squares or crossproducts to form the “within plus” values 
in Table 141. In the case of the criterion: 


For within plus method: 21,808.0 + 480.2 = 22,288.2 


For within plus high school chemistry: 21,808.0 + 1,513.8 = 23,321.8 
21,808.0 + 245 = 22,053.0 


For within plus interaction: 

The “within plus” values are substituted in normal equations to obtain 
regression equations for the two main effects and the interaction. This 
procedure may seem to be quite different from the single-classification 
analysis of covariance in which only regression equations for the total 
sample and for within the groups were computed. Closer examination will 
reveal, however, that the total regression equation of the single-classifi- 
cation calculations is nothing more than a “within plus main effect” re- 
gression equation computed in a simpler manner. Since multiple classifica- 
tion allows the identification of more sources of variation, the systematic 
calculation of “within plus” values as suggested here is applied. i 

The regression equation in deviation form for three X-variables is: 


ll 


y = ar + ate + 0% 
To solve for a, 42, and 4s, the necessary normal equations are: 
Day = adi + MËTT + agbrits 
Iry = ajër: + ala + agëreva 
Day = VETT + UAT + oi 
In the case of the within plus method values, the substitution yields: 
10,594.25 = 30,616.9941 + 370.48075a2 + 6,479.069a3 
397.533 = 370.4807541 + 26.17346a2 + 179.22863a3 
7,030.375 = 6,479.069a1 + 179.22863a2 + 6,167.065as 


Simultaneous solution of the three equations produces: 
aj = 0.068797128 a = 8.6183480 a3 = 0.81724110 


km 


FOGLE OST TES'KIGI OSFSO'SIE SLLPOTL OSZSSE ST 09201 IPIZISO erter GPOS6TE O'ESO'Z UoHoBI9jUuJ 
SSEST'OZI GIO'90F'9 SZproTSE 8Z0°609'9 LES'ZLE SIPGO'OL SIOPRS9 IGSHSS Gronn ste KA 

spent) [ooyos YH 

£98Z3'6LT 690'64P'9 SLOSF'OLE SZH'OSOL EES'LGE Gcroeut SIOLITI GPELTIZ 66'OTO‘OE 788773 Doan 

Kaz Kaz steg Ke Mag Wag grz gz kug A SOTA NIHIIA 


————————————————— SSS 
UR ynn paurquoy uoma fo samog dof sjonpoidssoug pun sombg fo SUNG ULOJ UODUIG “TFT ETEYVL 


eee eee 


OOFFT'LAT SGAITIFO O0890°ESE 006'296'9 OIS'ISE OPZLOOI LEG'SST‘9 Tese9o'es eege eme UM 
PZISFE 98979 ment ousgont OLTP  SLL89 POZSSI ZSPLOO OVOSGT gors uono619)uJ 
S8IPOZ 9SIG—  SLEZIO— SLSSSE— £IW8— LIZ SLO'SS  006P0'0 men Soot Sne 

E 4 KRSHUOTO fooyog QSTH 
matt wein  SLGIFLI iron ougt SHI Soin  Gsreso ong ue POPIN 
SLIOLFST G999Z0'% OSEPEZSE GLE'S98'9 OGTEGE SLEOETI LPE'OIPD S6I6ZIZ 68'LFSZE O'LPOFZ [ROL 


“Slee ` ` gegen wessen ST res a o = 
Sxtrg Ka Ka Wer We Wag ES irz ing fiz NOILVINVA A0 HOHNOS 


UOYDILD A fo saoimog UE sof sjnpoidssoug pun sauonby fo swng UNO UOYDUAT ‘OFT geet, 


358 


98244191 


a = tt 
£80 = Seon rl ni 
EPSOO'OFT EPSO0'OPT I POPIN 
OELLL Ant OSSPLLYS SI SL auory TNM 
EGJELLSETI SL THM + poy 
a A 
HUvads SHUVADS NOGIHUA NOLLVIUVA 
NVaW ao was JO SHHUDHA JO HOHNOS 
STVAGISHU 
—  — —  — s”ëvy€rjvv—oQO I ——— 
daysuorojoqi poyjo yr oy? fo sounonfrubrg fo eat rt STAVL 
DES IS SL EL *ZI89P06Z8'0 + “TISISLTE'8  TEEGISGOIO'O = ouer TNM 
Log LPS ZI PL *xzegzeezs'0 + *6zzeETE'8 + 'Z18Z9L6090'0 = uonjoG1ojuJ 
FIPS PPC FL PL *ICGPOGLEL'O + "26009SPG'8 + 'Z8ES9FLI80'0 = shqejg AystureYyD [0090S Yat 
26902'28€'GI FL *IOTIFZLIS'O + “ZOSFESTI'S + "78714648900 = POP 
SHUVADS JO WAS WKOAGAJUA K SNOILVADA NOLLVIUVA JO ADUNOS ' 
STVAdISHU JO SAJUDIA 


Ann A ynm pauaquuo y uoun fo samog of suoyonby uorssa.ibay 


‘CFI TIAV L 


359 


360 STATISTICAL METHODS 


Hence, the within plus method regression equation is 
y = 0.0687971282, + 8.618348012 + 0.81724110x3 


The sum of squares of residuals for within plus method are found from 
the equation: 


S.S. of residuals = Ey? — (oZz Lora + Ery) 


Substituting the values of the coefficients and the appropriate values from 
Table 141: 


S.S. of residuals = 22,288.2 — [(0.068797128) (10,594.25) 
+ (8.6183480) (397.533) + (0.81724110)(7030.375)] = 12,387.75693 


The remaining two “within plus” regression equations are found in this 
manner. The equations and their respective sum of squares of residuals 
are shown in Table 142. 

A regression equation for within subgroups alone is found within the 
four subgroups by substituting the within subgroup deviation values in 
Table 140 in the same normal equations and solving simultaneously. The 
results are also shown in Table 142. 

A test of significance for each source of variation can now be computed. 
In the case of the method main effect, first the residuals sum of squares 


TABLE 144. Tests of Significance for All Sources of Variation 
——— ŘŘŘŮĖõÁ 


RESIDUALS 
DEGREES OF SUM OF MEAN 
SOURCE OF VARIATION FREEDOM SQUARES SQUARE P 
Method 1 140.00843 140.00843 0.83 
High School Chemistry Status 1 2,296.90571 2,296.90571 13.69 
Interaction 1 0.09311 0.09311 0.00 
Within Subgroups 73 12,247.74850 167.77736 


for within alone is subtracted from the residuals sum of squares for within 
plus method. Then the degrees of freedom for within subgroups is found 
by subtracting the three degrees associated with the control variables and 
the three associated with the two main effects and the interaction from 
79, leaving 73. Mean squares are found, and the F-value computed in the 
usual manner’ as shown in Table 143. The F-values for the remaining 
sources of variation are computed by the same procedure and are listed 
in Table 144. 

Multiple-classification analysis of covariance has made it possible to 
test the hypothesis concerning the methods of teaching, while controlling 
on individual differences in aptitude and ability by means of control vari- 


1 As in the case of the multiple classification analysis of variance computation, 
some statisticians disagree as to the appropriate denominator in the F-equation. 
For further information references cited in Chapter 11 may be examined. 


ANALYSIS OF COVARIANCE 361 


ables and controlling on high school chemistry status by means oi strati- 
fication. 

The F-value of 0.83 for method is nonsignificant. Therefore, in so faras 
scholastic aptitude is controlled by ACE raw scores, prior achievement is 
controlled by high school grade-point averages, concurrent achievement 
is controlled by college algebra final examination scores, and no other 
pertinent factor related to achievement in beginning college chemistry 
contributes a bias, the effectiveness of the two methods of teaching can- 
not be proven unequal. 

Likewise the F-value of 0.00 for interaction is nonsignificant. Therefore, 
under the same conditions as stated in the foregoing paragraph, no dif- 
ferences in chemistry achievement could be demonstrated between stu- 
dents having and not having received credit in high school chemistry 
when taught by Method A and when taught by Method B. 

The F-value of 13.69 for high school chemistry status is significant. 
Thus, when the individual differences in scholastic aptitude and academic 
ability were controlled as already described, evidence exists that having 
or not having received credit in high school chemistry influences first- 
quarter beginning college chemistry achievement. Also, since the F-value 
Was significant, control of high school chemistry status by means of strati- 
fication in this problem is justified. 

If the investigator wants to determine whether high school chemistry 
tends to influence achievement in beginning college chemistry in the same 
Manner as discovered in the first example in this chapter, the appropriate 
Means in Table 138 are substituted in the within alone regression equation 
Shown in Table 142 as follows: 


Adjustment of Y, = (Xi, — Xa + GE Che (Xs, — XoJas 
=— — 114.54) (0.0600) 


ll 


= a 2 2.796) (8.318) 
E E — 22.89) (0.829) 


= —1.05 


ae only two categories are pr 
5 e the adjustment for the group 
chool chemistry is also 1.05, but the 
having received high school chemistry credit is the less talented group in 
terms of control variables, its criterion mean score is adjusted upward by 
1.05 to 82.15. The second criterion mean is adjusted downward by 1.05 


to 71.35. Obviously the first group js superior to the second in achievement 


esent and contain the same number of 
not having received credit in high 
sign is positive. Because the group 


362 STATISTICAL METHODS 


in beginning college chemistry. This conclusion is the same as that of the 
first example in the chapter. 

Application of the analysis of covariance to data classified in three or 
more ways can also be made, the procedure being a logical extension of 
that for double classification. 


Exercises 


1. One hundred pupils were taught a certain course in four different classes, 
three of which met in the morning and one met during the last period in the 
afternoon. Using the final examination score at the end of the semester as a 
criterion, the instructor wished to compare the achievement of the 25 pupils in 
the afternoon class with that of the 75 pupils in the morning classes. In order 
to control on individual differences which may exist between the classes, he ad- 
ministered a pretest at the beginning of the semester, and tabulated the intelli- 
gence quotients and reading comprehension raw scores of all pupils. . 

a. Ignoring the control variables, test the hypothesis that the, morning 
pupils did not differ from the afternoon pupils in course achievement as 
measured by the final examination scores. Interpret your results. 

b. Controlling on the pretest scores, the intelligence quotients, and the 
reading comprehension scores, test the same hypothesis. Interpret your 
results. 

c. Using the within subgroups regression equation find and interpret the 
multiple coefficient of correlation between the criterion and a combina- 
tion of pretest scores, intelligence quotients, and reading comprehension 
scores, 

d. Considering the computations and interpretations made in Exercises 1a 
and 16, cite the two principal advantages of second (controlled) test 
of significance over the first (uncontrolled) test of significance. 


pp mm aj 


MORNING AFTERNOON 


CLASSES CLASS TOTAL 
VARIABLE SYMBOL (k = 75) (k = 25) (N = 100) 

Final Examination Score zY 2,938 947 3,885 
Pretest Score EX, 1,986 691 2,677 
Le EX, 7,613 2,566 10,179 
Reading Comprehension Score 3X, 3,162 1,027 4,189 
Squares zy? 165,632 
zxi 83,523 

2Xi 1,078,119 

zX 185,689 

Crossproducts ZXY 114,266 
ZXY 412,210 

SP 167,351 

ZH: 288,615 

2X,X3 116,093 


ZA Xg 427,630 


ANALYSIS OF COVARIANCE 363 


2. A group of 110 unmarried male electrical engineers who had successfully 
completed a four-year undergraduate program at a university were asked to 
designate their principal source of funds for paying college expenses. For each 
of the following categories the all-college grade-point averages, scholastic apti- 
tude scores, and mechanical aptitude scores are listed. 


ALL-COLLEGE SCHOLASTIC MECHANICAL 
PRINCIPAL SOURCE GRADE-POINT APTITUDE APTITUDE 
OF FUNDS FOR AVERAGE SCORES SCORES 
COLLEGE EXPENSES NUMBER ZY EX, EX: 
Veterans Administration 41 134.99 4,534 1,977 
Fellowships 17 59.53 2,023 853 
Outside Employment 29 94.64 3,182 1,482 
Private Funds 23 70.61 2,458 1,064 
XY? = 1,200.1409 IX, Y = 40,134.12 
EX? = 138,288 ZXJY = 17,620.77 
ZXË = 267,834 2X¡X2 = 595,534 


Controlling on scholastic aptitude scores and mechanical aptitude scores, test 
the hypothesis that the electrical engineers, when classified according to the 
various sources of funds used to pay college expenses, do not differ in their all- 
college achievement. 

3. A current events test was ad 
entering an American History class. Ea 


ministered to 294 high school pupils who were 
ch pupil also indicated whether he read 


i A d i i G llege prepara- 
a daily newspaper consistently and whether his curriculum was colleg ) 
tory or ps a The examination scores (Y) along with intelligence quotients 
(X,) and reading ability scores (Xz) are summarized below. 


so 


CURRICULUM 
COLLEGE 
ee SYMBOL PREPARATORY ras 
wee oe TO 

E 98 49 
Yes Gr 8,173 3,971 
SC 11,461 5,813 
ZY, 2,410 1,145 
aa aaa 98 49 
No De 7,571 3,413 
5X, 11,268 5,421 
3x 2,334 1,137 

“Total ZY? = 1,901,280 EX,Y = 2,707,428 

SX} = 4,059,617 Ss SS 

am mp =X,X_ = 836,290 


By means of an analysis of covariance with CH e a ao 
i See inati i upils differ 

On the basis of the examination scores, whether vocational p 1 

rom those who do not, and whether there 1s a 

school curriculum and tendency to read a daily newspaper. Interpret your results. 


19 


Further Applications of 


Discriminant Analysis 


The problems of classification and selection of individuals are acute for 
personnel workers in education, industry, government services, the U.S. 
Armed Forces, and so forth, and especially to the individuals who are 
being classified or selected for some purpose. All concerned with these 
problems are agreed that classification and selection should be based 
upon more than random choice, timeliness of application, or hasty per- 
sonal interviews. Counselors and other personnel officers are in unanimous 
agreement that predictions should not be made upon the basis of any one 
characteristic but rather upon some type of composite forecast of satis- 
factoriness from many pertinent characteristics of individuals and the 
nature of the job or activity. In some instances, both classification and 
selection are desired. The inherent problems, then, become more complex. 
There is unanimous opinion that no mathematical analysis, no personal 
interview, no inspection of previous record, nor any combination thereof, 
will be 100 per cent satisfactory. 

In some instances, the problem is one of (1) classification alone, (2) 
selection alone, (3) classification followed by selection, or (4) selection 
when characteristics of classified groups need to be evaluated or con- 
trolled. Discriminant analysis is a powerful tool in analyzing any of the 
foregoing four types of problems. Originally the discriminant function 
was developed by Fisher! for the purpose of classification alone, During 
the past fifteen years considerable attention has been given to his tech- 
nique. In educational and psychological situations, the report of a con- 
ference at Harvard University? is of particular interest. Some further 


*R. A. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” 
Annals of Eugenics, 7:179-188, 1936. 

* David V. Tiedeman, and others, “The Multiple Discriminant Function—A Sym- 
posium,” Harvard Educational Review, Spring, 1951, pp. 71-95. 


364 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 365 


distinction may be needed among the four types of problems for which 
discriminant analysis may be appropriately used. 

An example of a situation involving classification alone may arise in a 
university including colleges of engineering and veterinary medicine. Both 
students and personnel officers may wish to know whether a student's 
characteristics more nearly parallel those of engineering students or of 
veterinary-medicine students. If a single characteristic such as scholastic 
aptitude, as indicated by the ACE Psychological Examination, is em- 
ployed, a score half way between the mean for engineering students and 
the mean for veterinary-medicine students can be considered as a cutting 
point. On the one side of this cutting point a student is classified as more 
like engineering students and on the other side as more like veterinary- 
Medicine students. 

If a test of significance is desired to indicate the usefulness of the dis- 
crimination, a pooled variance t-test is quite appropriate. It may yield 
evidence of the probability that the ACE scores for the population of 
engineering students and for the population of veterinary-medicine stu- 
dents have a common mean and a common variance, i.¢., a single homogen- 
Dous population. With ACE scores, it would not be surprising to find that 
this null hypothesis could not be disproven, although it would be expected 

at, to some extent, forecasts of achievement would be satisfactory 
within either group. 

t For these groups, however, if s 
Crest scale, it is probable that the nu 


distribution of such interest scores wou 
as follows, 


cores were available from a farming in- 
11 hypothesis would be rejected. The 
ld suggest two populations perhaps 


ir ider more 
The counselor or personnel officer, of course, har E të 
AN one characteristic. He wishes to weight several ¢ 


Such a, Way that maximum discrimination will be ee eine 
two groups of students. In terms of a mathematical expression, 3 une 


1 ` 
S desired SÉ the form 

op + ges + Ass, etc., i 
t characteristics and the a’s are 
distinction between the two 
solved for each student, a 


i the 2's are measurements of differen 
Eo Weights whieh will produce maximum 
Ups of students. If this function should be 


366 STATISTICAL METHODS 


series of predicted scores would be obtained. A t-test could then be made 
between the means of the predicted scores for the two groups of students. 
Fortunately such a solution for each individual student is unnecessary. 
The methods for making the t-test and for ascertaining the appropriate 
weights will be shown later in this chapter. The method of classification 
implies that the two groups represent a noncontinuum or nonvariable 
characteristic. In selection, on the other hand, the two groups are con- 
sidered to represent a continuum or variable characteristic which is no 
more accurately available than in a dichotomy. The methods for analysis 
suitable in such situations have been shown in the chapter, “Serial Cor- 
relation and Discriminant Analysis.” Some research workers have used 
the technique appropriate for classification in situations involving selec- 
tion. In most respects, the two methods yield similar interpretation. The 
tests of significance are mathematically equivalent although the formulas 
upon inspection appear different. The weights (a1, a2, 43, and so on), al- 
though not equal, are proportional with the assumptions suitable cither 
for classification or for selection. The methods of the foregoing chapter 
dealing with problems of selection have the decided advantage to the 
counselor since probabilities can be made available for any given individ- 
ual of eventually appearing in either segment of the dichotomy. 

In many situations, particularly in industry and the U.S. Armed Forces, 
the first problem is one of classification. After the groups have been iden- 
tified, the technique suitable for selection follows in each group. No new 
statistical methods are necessary. The methods appropriate for classifi- 
cation are followed by the methods appropriate for selection. 

In many situations, the problem of selection frequently occurs when 
characteristics of certain classified groups need to be evaluated or con- 
trolled. One example might be the evaluation of stenographers on the 
basis of scores on several tests when the women have been classified as 
married or single. A complete analysis should 
relative satisfactoriness of married and single w 
as suggested by test scores. The methods, here 
covariance. In situations such as the one just d 
sis is appropriate whenever the criterion is nu 
criterion is a dichotomously expressed contin 
variance methods need some slight modificati 
chapter. 


yield evidence concerning 
omen as well as competence 
appropriate, are similar to 
escribed, covariance analy- 
merically expressed, If the 
uum characteristic, the co- 
on as described later in this 


MAXIMUM SEPARATION OF Two POPULATIONS 


The Kuder Preference Record is frequently used to 
interests which distinguish certain groups. For the a 
scores on four interests for 42 students in mechanical 
students in electrical engineering have been chosen f 


find the pattern of 
resent purpose, the 
engineering and 52 
or illustration. The 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 367 


TABLE 145. Kuder Interest Scores of Engineering Students 


ENGINEERING CURRICULUM 


MECHANICAL ELECTRICAL BOTH 
INTEREST k = 42 k = 52 N =94 
Mechanical ZN 4364 4938 9302 
Ai 103.90 94.96 98.96 
Scientific EX: 3240 4198 7438 
Xo 77.14 80.73 79.13 
Artistic IX: 2280 2472 4752 
As 54.29 47.54 50.55 
Clerical IX: 2274 2550 4824 
As 54.14 49.04 51.32 


DIFFERENCE IN MEANS 
(Mechanical minus Electrical) 


dy = 8.943223 
dą = —3.587912 
ds; = 6.747253 
d= 5.104396 


AAA i + 


mean scores are shown in Table 145. The differences in means have been 
found by subtracting the mean for the electrical-engineering students 
from those for the mechanical-engincering students. This decision was 
entirely arbitrary and would have been equally satisfactory if the pro- 
cedure had been reversed. It should be apparent from this decision that 
the higher the v-scores predicted from a discriminant function, the more 
a student’s interest pattern parallels that of mechanical-engineering stu- 


dents. 
The equations for deriving the 
ables are: 


discriminant function from four vari- 


dy od Lomp + geit + asduta 
de = Onst ada + aa + AZT 
dy one + 2x0 + agta3 + 022304 
dy = 420 + PtT + agërsta + agzat 


where the subscripts refer to the interests as follows: 


xı = mechanical interest 
scientific interest 


le = 
z, = artistic interest 
x, = clerical interest 


The sums of squares and crossproducts needed are the within group 


deviation values as follows: 


368 STATISTICAL METHODS 


DEVIATION SCORES 


SYMBOL RAW SCORES TOTAL WITHIN 

2X? 931578 11075.830 9217.543 
zX 601266 12714.469 12415.374 
2X3 258384 18155.235 17097.495 
Xi 261148 13584,426 © 12979.066 
ERR 736814 768.511 1514.034 
EE 474160 3914.213 2512.220 
EX,X, 476526 —844.723 —1905.352 
EX2X3 373358 —2656.638 —2094.177 
ZXoX4 377352 —4359.824 —3934.318 
EX¿X, 237804 —6064.595 — 6864.791 


When these needed values are substituted in the equations, the values 
of a, az, az, and as obtained by simultaneous solution yield the discriminant 
function 


v = 0.00099301z, — 0.000050848x, + 0.00057481:x3 + 0.00082767x4 


The difference in the means (D) of the two groups on the variable v can 
be found by substituting for Tı, Tə, La, and z, the difference in the means 
for the four interests. Thus D = 0.01716634 which is also the within group 
sum of squares. The analysis is then shown in Table 146. The number 
of degrees of freedom for the discriminant function.is the number of vari- 


ables. The sum of squares for the function is E D? where ky and ks are 


the frequencies in the distributions. Actually this value does not need to 
be computed if the purpose is to test the significance of the discrimination 


between mechanical-engineering students and electrical-engineering stu- 
dents by means of an F-test. 


TABLE 146. Analysis of Maximum Separation 


SOURCE 
OF VARIATION DEGREES OF FREEDOM SUM OF SQUARES 
Function m 4 Eaka D? 
H . N 
Within N-m-1 89 D 
Total N-1 93 kaka D+D 
N 
kik D: 
From Table 146, the mean square for the function is A and for 
m 


within is 


D 
N =m- I 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 369 


Since F equals the former divided by the latter 
ON —m— 1 (hk 
Baitha ( N In 
Thus, 
_ 89 (42)(52) P 
Fis = $ [ SC | (0.01716634) = 8.87 

There is ample evidence from the analysis‘ to conclude that the pattern 
of interests differs between mechanical- and electrical-engineering stu- 
dents. 

A counselor may desire to report to a counselee which of these types of 
students his interest pattern more nearly parallels. If so, a critical score 
is needed which may be found by solving the discriminant function twice, 
once by inserting the mean values of X1, Xe, Xs, and X4 for mechanical- 
engineering students and then the mean values for these variables for 
electrical-engineering students. The critical score is considered to lie mid- 
way between these predicted v-scores. When the mean scores, shown in 
Table 145, are inserted into the discriminant function, the v-scores arc: 


For mechanical-engineering students, 0.175268 
For electrical-engineering students, 0.159718 


The critical score is then 0.167493, midway between the foregoing pre- 
dictions. Thus, if a student’s scores on these four interests are known, 
they may be inserted into the discriminant function, yielding a 
v-score for him. If this score is greater than 0.167493, his pattern of in- 
terests more closely resembles that of mechanical-engineering students. 
Conversely, if his score is less than 0.167493, his pattern of interests is 


more in line with electrical-engineering students. 


ELIMINATION OF VARIABLES FROM THE 
DISCRIMINANT FUNCTION 


` The personnel officer is constantly confronted with the possibility of 
eliminating tests from a battery. Usually, such an elimination, although 
probably not in the case of the interest data just shown, is accompanied 
by a reduction in cost. Normally it is difficult to justify a test score in a 
battery which does not significantly enhance forecasting effectiveness. In 
the example shown the smallest contribution apparently was that of 


the scientific interest. 
When a discriminant functi 
variables, it was found to be 
v = 0.0009848802; + 0.000591486x; + 0.00085070614 


on was computed for the remaining three 


ve been obtained had the method of selection been 


“The F of 8.87 would also ha e 
1 Correlation and Diseriminant Analysis.” 


followed shown in the chapter, “Seria 


370 STATISTICAL METHODS 


which yields D = 0.01714125. This value cannot be directly compared 
with the D of 0.01716634, since a variable has been eliminated, until ad- 
justment is made so that the sums of squares for total are equal. Probably 
the most feasible method is to adjust the totals to equal unity. As seen 
from Table 146, if the D is divided by 


which readily reduces to 


ae 
N + halaD' 
a new within sum of squares (D”) is obtained. 
Thus, for the four-variable regression 


= 94 + (42) 62) (0.01716634) — 0-714876 


and for the three-variable regression 


he 94 — ail 
2= AF aero = 0-715174 


Thus there is little difference in the sample shown in the sums of squares 
of individual differences in v-scores which are not associated with the 
four and three interest scores depending on the engineering curriculum 
chosen, both lying between 71 per cent and 72 per cent. 

To test the significance of the loss by elimination of the scientific in- 
terest from the battery, an F-test may be made. It is not necessary for 
such a test that only one variable be eliminated as is here done. The test 
is shown in Table 147. 


D' 


TABLE 147. Loss by Elimination of Scientific Interest 


SOURCE DEGREES OF SUM OF MEAN 
OF VARIATION FREEDOM SQUARES SQUARE 
3 Variable Residuals 90 0.715174 
4 Variable Residuals 89 0.714876 0.008032 
Loss 1 0.000298 0.000298 


Since the mean square for loss is less than that for the four variable 
residuals, no contribution was proven for the scientific interest in the in- 
terest pattern distinguishing students of mechanical and electrical engi- 
neering. 


H 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 371 


ATTRITION IN GROUPS—SINGLE 
CLASSIFICATION 


The prediction of attrition-survival in a single group has been previ- 
ously discussed in a preceding chapter. In certain situations the problem 
of evaluating attrition in two or more groups with prior educational ex- 
perience becomes highly important. For example, an investigator might 
desire to compare the survival-attrition of college matriculants when clas- 
sified according to some characteristic such as whether their parents had 
graduated from college. Survival in this case refers to college graduation. 

In Table 148 a sample of 72 high school graduates who attended col- 
lege has been classified on the basis of graduation and attrition when the 
education of the parents has been classified into two groups, i.e., neither 
parent a college graduate and one or both parents college graduates. 


TABLE 148. Attrition-Graduation of Matriculants and Education of Parents 


COLLEGE EDUCATION OF PARENTS GRADUATION ATTRITION TOTAL 
One or Both Parents Graduated 31 5 36 
Neither Parent Graduated 18 18 36 
A eee 
Total 49 23 72 


En 


The straightforward method of testing the significance of the difference 
between these two groups on attrition-graduation would be to compute 
chi square. The values needed for the computation are shown in Table 
148. In this case chi square is 10.796, which is significant at the 1 per cent 
level, 

In this analysis any individual differences in studentship which might. 
be indicative of attrition and graduation tendency have been ignored as 
the chi-square analysis does not lend itself to sufficient control. Another 
method of analysis, more cumbersome to handle than chi square, can be 
used which will obviate this difficulty. If the assumption is made that 
attrition-graduation tendency is a characteristic that is normally distrib- 
uted, an analysis of variance can be made in the usual fashion. The mean 
sigma unit of attrition-graduation tendency for those who did graduate is 


RK A en TË 
> and for those who were in the attrition group the mean sigma unit is —, 


q 
otal group of 72 students and 
to assign values of 0.5250857 
—1.1186608 to each member 


the 2-values having been obtained from the t 
not from the subgroups. Thus it is possible 
to each member of the survival group and 
Of the attrition group. 

The sum of the standard scores 
8raduated from college would be 


for the 36 students whose parents 


372 STATISTICAL METHODS 


(31) (0.5250857) + (5)(— 1.1186608) 


which yields a sum of scores for this group of 10.68435. For the 36 stu- 
dents neither of whose parents graduated from college the sum of the 
standard scores would be 

(18) (0.5250857) + (18)(—1.1186608) 
which yields a sum of scores for this group of —10.68435. 


The sum of squares for parental education status would then be ob- 
tained in the usual manner 


(10.68435)? + (— 10.68435)2 
36 


= 6.34196 


It should be noted that the correction term is zero since the sum of the 
standard scores is zero. 


The within sum of squares needed in the analysis of variance is = 


where the z-, p-, and g-values are found fro 
lem at hand the within sum of squares is 
72(0.35735)2 

(0.68055)(0.31945) 7 42-291 
The number of degrees of freedom is N 
found from the attrition-graduation rat 
the population. On the other hand, no de 
variations within the cells of the table 
not consider these variations.1 


m the total group. In the prob- 


— 1, or 71, since the squares are 
io in the sample rather than in 
grees of freedom associated with 
are lost since the formula does 


J 22 
1 That 22 represents the within sum of squares for attrition-survival may be seen 


from the example of hypothetical 


information as contrasted to actual information shown 
in the following examples. 


ACTUAL SITUATION HYPOTHETICAL SITUATION 
.x_ E _ NN 


Graduated Graduated 

Group Yes No Group Yes No 
AS i ee 

A 31 5 A 25 11 

B 28 18 B 24 12 
Pet — Í—m——— 

All 49 23 All 49 23 
AP 3 E E a EE 

2 
Në L 42.291 Në = 49.291 


Pa 


z Na... ithi 
Since the Gë is independent of the distribution with the two groups, it is a within 


group sum of the squares with N — 1 de 
of classified informati al curve, demands some Te- 
, in a problem of five groups 


f which is of no interest and should not 
be shown or even computed, is 103, three more than the number of students involved. 


i SA TË 
SS EE 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 373 


TABLE 149. Analysis of Variance of Parental Education Status 


SOURCE OF DEGREES OF SUM OF MEAN 

VARATION FREEDOM SQUARES SQUARE P 
Parental Education Status i 6.342 6.342 10.65 
Within 71 42.291 0.5956 


The analysis of variance is shown in Table 149. The analysis indicates 
that there is a significant difference in graduation tendency depending 
upon the parental education status. The F-value of 10.65 is a close check 
to the chi-square value reported from the same data. It is interesting 
to note that the F-value would have been 10.796, identical to chi square, 
if 72 rather than 71 had been the number of degrees of freedom. In this 
particular case, that is, when the number of degrees of freedom for F 
and chi square are one, the size of the values may be directly compared. 

There is obviously no advantage in the analysis of variance just shown 
over that obtained by the use of chi square unless some attempt is made 
to control upon some characteristic which is presumably related to the 
attrition-graduation tendency. There is some advantage noted in the 
case of multiple classification, in which situation chi square becomes 
more cumbersome. 

When the 72 students are classified on the basis of sex as well as by 
parental education status it is possible for the usual analysis of variance 
to be employed to test the significance of parental education status, 
sex, and interaction. The attrition-graduation for each of the subgroups 
is shown in Table 150. Ga : 

The sigma scores for graduation and attrition based upon the entire 
group will, of course, remain the same throughout the analysis, As pre- 
Viously computed the sigma score for any one individual in the graduation 


TABLE 150. Attrition-Graduation Classified by Sex and Parent Education 


eee eee 
PERCENTAGE 


COLLEGE-EDUCATE 
PARENT ring SEX GRADUATION ATTRITION GRADUATED 
— Ee a 
Yes Male 16 
Female 15 5 86 
à Both AR kë 
ae AS 4 78 
No Male 4 14 22 
Female 
i Both ER ee PAE CES E NM 
EE 83 
Total Male e së 53 
Female 49 23 68 


374 STATISTICAL METHODS 


group is 0.5250857, and for any one individual in the attrition group is 
—1.1186608. 


TABLE 151. Sum of Sigma Scores for Graduation Tendency 


COLLEGE-EDUCATED 


PARENT MALE FEMALE TOTAL 
Yes 6.16405 4.52030 10.68435 
No 2.87656 —13.56091 — 10.68435 
Total 9.04061 —9.04061 0.00000 


The sums of squares needed for analysis of variance were found in the 
usual manner. 


eae Nz 
For within subgroups: Pa = 42.291 


$ 2 — 10.68435)? 
For parental education status: (10.68435)° + (— 10.68435)? = 6.3420 


2 = 2 
Forse: (9.04061 t 9.04061 = 4.5407 


6.16405)? + (4.52030)2 + (2.87656)? + —13.56091)2 
For subgroups: 18 


= 13.9223 
For interaction: 13.9223 — (4.5407 + 6.3420) = 3.0397 


The analysis of variance is shown in Table 152. 


TABLE 152, Analysis of Variance of Alttrition-Graduation 
by Sex and Parent Education 


SOURCE DEGREES OF SUM OF MEAN 
OF VARIATION FREEDOM SQUARES SQUARE F 
Parental Education Status 1 6.342 6.342 10.65 
Sex 1 4.541 4.541 7.62 
Interaction 1 3.040 3.040 5.10 
Within 71 42.291 0.5956 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 375 


matriculants with college-educated parents and greater for boys than for 
girls. There is a sex discrepancy in the tendency to graduate for matricu- 
lants with college and noncollege-educated parents. 

These inferences have been drawn disregarding student aptitude to do 
satisfactory college work which undoubtedly is a major source of attri- 
tion. As a possible source of removing bias and of making a more sensi- 
tive test of significance some scheme is needed whereby an analysis may 
be made which corresponds closely to the analysis of covariance which 
will allow for individual differences in student aptitudes. 

The I.Q. and the high school grade-point average were available for 
the 72 students and were used as indicators of student aptitude. An anal- 
ysis was made classifying students on the basis of college graduation of 
parents. Differences were noted in the mean 1.Q., as well as high school 
grade-point average, as indicated in Table 153. These differences might 


TABLE 153. Sums of I.Q. (Xi) and High School Grade-Point Average (X2) 


PARENT GRADUATED FROM COLLEGE 


YES NO BOTH 
GRADUA- ATTRI- GRADUA- ATTRI- GRADUA- ATTRI- 
SEX TION TION TION TION TION TION 

Bo, 1 214 1541 396 3342 610 
id Sa KI 47 3.63 35.42 7.48 78.89 11.11 

N 16 2 14 4 30 6 

Girl ap, mn 347 419 1463 2163 1810 
m oe e 7.01 1038 am 5894 374 

ba 15 3 4 14 19 17 

bh ` a ` gr 561 1960 1959 5505 2420 
di pë TË 03 "10.64 45.80 38.21 137.83 48.85 

N G 31 5 18 18 49 23 


account for some, or all, of the greater probability of graduation of stu- 
dents whose parents were college graduates than those whose parents 
were not ates. 

The SE followed in covariance was used in order to make the 
analysis, with Mads replacing the usual zy's in the solution. Two dis- 
criminant equations were developed, one using total deviation values and 
one using within deviation values. It should be pointed out here that this 
within deviation value is within the parental education status only and 
not within the attrition-graduation groups. These values are found in the 
same manner as in covariance and will not be here reported. 

The general formulas for the solution of the discriminant equation are 


Nady = G22} + Drt 
Nada = QZT F amix 


376 STATISTICAL METHODS 


When solving the diseriminant equations for total the values on the 
right-hand side of the equation are deviations from the general mean 
exactly like those used when solving the deviation equations in the analy- 
sis of regression and analysis of covariance. The Nzd's occurring in the left- 


hand members are the usual Zen, in which the y-values are either — E 


for each individual in the attrition group, or S for each one in the survival 


group. In the example shovyn, the sigma values of —1.11866 and 0.5250857 
apply to the attrition and survival groups, respectively. 

From the information concerning I.Q.’s and high school grade-point 
averages shown in Table 153, the left-hand members of these equations are: 


(5505) (0.5250857) - (2420)(—1.1186608) = 183.4376 
and (137.83) (0.5250857) + (48.85)(—1.1186608) = 17.7260 


It should be noted that these expressions are mathematically identical 
with the Nzd’s which have been used in the foregoing discriminant anal- 
ysis involving a dichotomous classification. These left-hand members of 
the equations lend themselves to generalization with ensuing analysis 
corresponding to covariance analysis as well as analysis when the crite- 
rion is available in more than two segments. 

The discriminant equation, in either solution, will be 


v = 0.007795x, + 0.44015622 
where 
ms LO. 
%2 = high school grade-point average 


If A is defined, as has been done, as similar to the sum of squares for 
regression in regression analysis, then 


A= axy + Dry + +++ 
In the situation just described, 


A = 0.007795(183.4376) + 0.440156(17.7260) = 9.2321 
Thus, the traditional formula for biserial correlation 


=P YA 


upon substitution becomes 
Ris = 1.7024 wë — 0.6096 
If the formula for point biserial correlation is employed of 


Rp = A 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 377 


the coefficient is found to be 


R= 9.2321 _ V02183 = ei 
p T0587) 0.2183 = 0.4672 


efficient is corri 

8 ping by 
If this co ected for coarse groupin by the factor of 1.198 as 
indicated by the tabled values shown in the Appendix 


R = (0.4672)(1.198) = 0.5597 


The relationship is of sufficient magnitude to indicate clearly, as a test 
of significance would indicate, that the combination of LO. E high 
school grade-point average is particularly useful in indicating graduation 
tendency among those who matriculate in college. In an analysis of this 
pen sample, the proportion of the individual differences in gradua- 
ion tendency that can be explained by variations in LO. and high school 
grade-point average is 0.2183 as determined from the formula 


There still remains, of course, the proportion of 0.7817 which represents 
individual differences in graduation tendency not associated with varia- 
tions in 1.Q.'s and high school grade-point averages. 

To obtain the within group deviation values necessary in discriminant 
analysis, the terms for the right-hand members are found in the same 
manner as such values are found in covariance analysis. The left-hand 
members, corresponding to the Zus, are obtained in a similar fashion 
except that the y-value is either of two values for all cases, the value being 


2 
Ca q for each attrition individual and 5 for each survival individual. 


The sum of the Dzy’s for within parental education status for 1.Q. is 
for the 31 and 5 distribution for those 


obtained as follows: The z-values 

individuals whose parents graduated from college and the 18 and 18 
distribution for those individuals whose parents did not graduate from 
college are 0.22137 and 0.39894, respectively. The sigma scores for these 


two groups are as follows: 


Parent Graduated from College: 
‘Attrition = — 1.593864 


Graduation = 0.257075 
For Parents not Graduated from College: 


Attrition = —0.79788 

Graduation = 0.79788 
The sums of the 1.Q.'s shown in Table 153 for these groups, when mul- 
tiplied by the foregoing values, yield Zxy for the left-hand member of 
the first equation, as follows: 


378 STATISTICAL METHODS 
Zou = (3545)(0.257075) + (561)(— 1.593864) + (1960) (0.79788) 
+ (1859)(—0.79788) = 97.76 


The sum of the within parental status zy’s for high school grade-point 
average is 


Zz:y = (92.03) (0.257075) + (10.64)(— 1.593864) + (45.80) (0.79788) 
+ (38.21)(—0.79788) = 12.7558 


The within parent education status equations needed in the discriminant 
analysis are 


97.76 = 8114.639a1 + 178.3991a, 
12.7558 = 178.399a, + 30.9591a, 
which yield upon solution : e 
v = 0.003422, + 0.39232, 


where xı = LO. and xa = high school grade-point average. The A-value, 
corresponding to the sum of squares for regression, is 


A = (0.00342) (97.76) + (0.3923) (12.7558) = 5.3384. 
The traditional formula for multiple biserial correlation, 


— Di fA 
Bus a NF 
becomes upon solution, 


Bu = 1.7024 4 ps = 0.4636 


If the formula for point biserial correlation is employed, the coefficient is 


A 5.3384 — 
hi HË S d 3)(0.5874) T Y0-1262 = 0.8555 
PY 


when this coefficient is corrected for co. 
factor 


arse grouping by the approprial 


R — (0.3555)(1.221) — 0.4338 


A e ë 
E ) the proportion of the individual 


Pq 
differences within the groups which is associated with LO. and high school 


grade-point average was 0.1262 and the proportion not so associated was 
0.8738. 


As indicated by the solution of 


With the known proportion of the varian 
tendency to graduate which can be cont 
grade-point average, it is possible to go ba 
when an analysis of variance was made 


ces in individual differences in 
rolled by 1.Q. and high school 
ck to the information assembled 
without controls and remove, as 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 379 


in covariance, any allowances which need to be made because of the 
individual difference between the groups on the control factors. The 
procedure is shown in Table 154. 

The sum of squares for total, corresponding to Ey? in covariance analy- 
sis, is the within sum of squares, 42.291, plus the sum of squares for 
parental education status which was reported in Table 152 as 6.342. The 
proportions of these total sums of squares which cannot be accounted for 
by variations in 1.Q. and high school grade-point average are shown in the 
next column. The proportions not predicted are taken into the unadjusted 
sums of squares to obtain adjusted sums of squares with the loss of two 
degrees of freedom in this case since two variables were used in the dis- 


criminant function. 


Tame 154. Analysis of Covariance of Parent Education 


ADJUSTED 


SOURCE UNADJUSTED NOT E 
OF VARIATION df. SS. PREDICTED d.f. SS. M.S. F 
Total 72 48.634 0.7817 70 38.0172 
Within 71 42.291 0.8738 69 36.9539 0.5356 


Parental Education Difference 1 1.0633 1.0633 1.99 


r obtaining an F-value is exactly as in 
o sums of squares from each other and 
lue for the F-test. An F-value of 1.99 


From this point the analysis fo 
covariance by subtracting these tw 
obtaining a within mean square value 
was found in this case, which is not sign! cant. ; 
‘hes Gade a s indicated in Table 


W. vere analyzed without control, a 
eb qe hë 4 ficant difference be- 


152, the inference was drawn that there was a signi 
whose parents were college grad- 


tween graduation tendency among those | 
uates and those who were not. This inference needs to be revised from 
the analysis utilizing the LO. and high school grade-point average. 

From the data at hand it was impossible to demonstrate that there is 
any difference between college graduation and parents’ college education 
other than that which could be expected from student differences as indi- 
cated .Q. and high school grade-point average. 

he real të ee: suitable for single classification can be enlarged 


i ification i mner h analysis 
to i fication in the same manner as suc 
ees mii f multiple classification the 72 stu- 


is done in covariance. In the case 0 stu 
d education status and sex and it is 


d y i rental d 
Center deer En he within values be combined with 


required, as in covariance, that t es bined 
whatever main effects are to be tested to form a within plus discriminant 


equation. 


380 STATISTICAL METHODS 


TABLE 155. Deviation Sums of Squares and Crossproducts 


e 


SOURCE za za Ze Ezy Brzy 
Total 9258.653 35.7952 252.78 183.4375 17.7260 
Sex 6.125 0.6198 1.9484 —9.7244 —1.9097 
Parent 1144.014 4.8361 74.3809 85.6784 4.9702 
Interaction 238.347 1.4619 18.6707 25.1346 1.8582 
Within 7870.167 28.8774 157.78 82.3489 12.8073 
Within Plus 
Sex 7876.292 29.4972 159.7284 72.6245 10.8976 
Parent 9014.181 33.7135 232.1609 168.0273 17.7775 
Interaction 8108.514 30.3393 176.4507 107.4835 14.6655 


The values needed in the solution are shown in Table 155. The sums of 
squares and crossproducts required for the right-hand members of the 
equations are found exactly as in covariance analysis, but some discussion 
of the method of obtaining the 2xy's may be in order. 


The Zzy's for total, when no classification of students has been made, 
are for 1.Q. 


Znu = (5505)(0.5250857) + (2420) (—1.1186608) = 183.4376 
for high school grade-point average 
Zay = (137.83) (0.5250857) + (48.84) (— 1.1186608) = 17.7260 


The Ezy's associated with sex can be found by subtraction from the 
Zeie for total, the Zzy’s within the sex classification from the information 


shown in Table 153. It should be recalled that the y-values are — and 
q 


zZ Ss . 
> for the attrition and survival groups. Thus for the 36 boys, 30 of whom 
graduated 


7 = —1.49910 and 


and for the girls, 19 of whom graduated 


z 
5 = 0.29982 
P 


A = —0.84276 and 5 0.75405 


The Zzy's within the sex classification are for LO. 


Ze = (3342) (0.29982) + (610) ( — 1.49910) + (2163) (0.75405) 


+ (1810)(—0.84276) = 193.1620 
and for high school grade-point average 


Ze = (78.89) (0.29982) + (11.11) (—1.49910) F (58.94) (0.75405) 


A (87.74) (—0.84276) = 19.6357 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 381 
These Dzy’s are subtracted from the totals of 183.4376 and 17.7260, yield- 
ing Zu = —9.7244 and Sry = —1.9097, associated with the sex classifi- 


cation and have been entered in Table 155. 
The Exy's associated with parental education are obtained in a similar 


manner. Thus for I.Q., 
Yay = 183.4375 — [(3545)(0.257075) + (561)(—1.593864) 

+ (1960) (0.79788) + (1859)(—0.79788)] = 85.6784 
and for high school grade-point average, 
Ezy = 17.7260 — [(92.03) (0.257075) + (10.64) (— 1.593864) 

+ (45.80)(0.79788) + (38.21)(—0.79788)] = 4.9702 
The Szy’s within subgroups are obtained from the within subgroup 
values shown in Table 155. The y-values for the subgroups are 


ATTRITION SURVIVAL 
College Parents 
Boys —1,704510 0.213064 
Girls —1.49910 0.299820 
Noncollege Parents 
Boys —1.3401 0.382886 
Girls —0.382886 1.3401 


For within subgroups the Zu is 
(1801) (0.213064) + (214) (— 1.704510) + (1744) (0.299820) 
+ (347) (—1.49910) + (1541) (0.382886) 
+ (396)(— 1.3401) + (419)(1.3401) 
+ (1463)(—0.382886) = 82.3489 
for within subgroups is 12.8073. 
ained by subtraction. From the Zxy's 
3ay's for the main effects. Thus, 


In a similar way the Zasy 

The Zxy's for interaction are obt 
for total are subtracted the sum of the 
for LO. the Zeus associated with interaction is 


Ery = 183.4375 — (9.7244 + 85.6784 + 82.3489) = 25.1346 


and for high school grade-point average 
Lazy = 17.7260 — (—1.9097 + 4.9702 + 12.8073) = 1.8582 


Similar to covariance, four discriminant equations are needed, Lë 
(1) within subgroups, (2) within subgroups plus sex, (3) within SS 
plus parental education, and (4) within subgroups plus interaction. The 
Zay's needed are shown in Table 155. wë Betz, 

The equations needed for obtaining the within subgroup diseriminan 
equation are 
82.3489 = 7870.167a1 + 157.7802 


12.8073 = 157.784 + 28.8774a2 


which yield a discriminant equation of 
v = 0.00176521 + 0.4338622 


382 STATISTICAL METHODS 


which yields a A of 5.70196. The proportion of the graduation tendency 
which cannot be predicted from 1.Q. and grade-point average is obtained 


from the expression, 1 — er and found to be 0.8652. This value is 
fe 
entered in Table 156. 
The same procedure applied to the within group plus sex values, shown 

in Table 155, gives equations 

72.6245 = 7876.292a; + 159.7284a2 

10.8976 = 159.7284a, + 29.4972. 
which, upon solution, yield 

v = 0.00194165x, + 0.358932 

producing a A of 4.0525. The nonpredictable proportion of the graduation 
tendency, 0.9042, is also entered in Table 156. 


The within group plus parental education equations, taken from Table 
155, are 


168.0273 = 9014.181a, + 232.1609a2 


17.7775 = 232.1609a, + 33.71350 
which, upon solution, yield 


v = 0.006150172, + 0.4849592 
producing a A of 9.654758. The resulting proportion of nonpredictable 
graduation tendency, 0.7717, is entered in Table 156. 


Information taken from Table 156, provided the equations for within 
group plus interaction as follows: 


107.4835 = 8108.514a, + 176.4507a2 


14.6655 = 176.4507a, + 30.3393a, 
which, upon solution, yield 


v = 0.0031332x, + 0.465162 
producing a A of 7.15858. The resultin roporti i 
graduation tendency, 0.8307, is entered in Table e inicias 

The analysis of covariance is shown in Table 156. The unadjusted 
sums of squares are obtained from Table 152. Adjusted sums of squares 
are found by multiplying by the four proportions of the sums of squares 
not predictable from LO. and high school grade-point average, 

The number of degrees of freedom has been reduced by two in ob- 
taining these adjusted sums of squares since two prediction variables have 
been used. The remainder of the analysis is identical with covariance 
both in method and interpretation. 

The tendency for students to com: 


T S plete the program for the bacca- 
laureate degree, which was significan 


tly greater without control for stu- 


£0€9'0 
arunbg UI 


10% £990'T I 
S8'OT ££€L'9 T 
41 66€6'0 1 
guvnbs woagaud 
NVEK 40 
EE 


NIHLIM SANIN LOJA 


Z06S"9€ 69 29980 162 sy 14 auory THM 
DOG0 16 04 20€8'0 Tec SP Ze 1019981990] 
Seve Sp OL ZP06'O Ku ZL xeg 
TOEG'LE OL Du ££9"Sp GL DORISODE (ois, 
sauvabs KOGAAUA 004 ë sauvnds IEN SATA NIHIIM 
40 RG) LON 40 RG) 
was SAJUDAA  NOLMOJONI was saquoaa 
agisarav agisoravNno 


aBowany quroJ-opo.1) 100495 ybyy puv ‘I fo Joyuoy y pan aounripa0g fo sishouy “961 TIAVL 


383 


384 STATISTICAL METHODS 


dents with college-educated parents than for those of noncollege-edu- 
cated parents, could not be demonstrated to be different for these two 
groups of students when LO. and high school grade-point average were 
considered. 

When judged by the evidence here shown, boys who enter college tend 
to graduate more often than girls, when allowances are made for LO. and 
high school grade-point average. 

Although graduation tendency was greater for girls with college parents 
and smaller with girls with noncollege parents than would be expected 
from the attrition-graduation of the boys, the discrepancy failed to meet 
the usually demanded significance level. 

The foregoing inferences have been drawn from the evidence in Table 
156 in regard to significance and from Table 150 in regard to the group 
excelling whenever significant differences appeared. It would be more 


TABLE 157. Frequencies Adjusted for I.Q. and High School 
Grade-Point Average 
e SSS 
PARENT GRADUATED FROM COLLEGE 
YES NO 


GRADUATED ATTRITION GRADUATED ATTRITION 


Boys Actual 16 2 14 4 
Adjusted 15.99 2.01 14,49 3.51 

Girls Actual - 15 3 4 14 
Adjusted 13.88 4.12 4.81 13.19 


Entries for this table have been found as illustrated in the subgroup, 
boys with college-educated parents. In this group, 16 graduated and 2 did 
not. The proportion graduating was 0.8889. A table of the normal curve 
reveals that this proportion yields a normal deviate, or sigma score of 
1.2206. It is this sigma score that should be revised, upward whenever “he 
subgroup is inferior on the control variables and downward whenever 


superior. The amount of revision is found by using the within subgroup 
discriminant equation 


v = 0.001765x, + 0.433862. 
where 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 385 


v = amount to be algebraically subtracted from the sigma score, which 
in this case is 1.2206 


x, = the algebraic amount the mean I.Q. in this subgroup excels the mean 
LO. of the entire group, which in this particular illustration is 1.875 


13 = the algebraic amount the mean high school grade-point average in 
this subgroup excels the mean high school grade-point average of 
the entire group, which in this particular case is 0.02389. 


The revision necessary then is 
v = 0.001765(1.875) + 0.43386(0.02389) = 0.0033 


Thus the adjusted sigma score is, (1.2206 — 0.0033) = 1.2173, which the 
normal curve table indicates a proportion of 0.8883. Thus, with this group 
of 18, the expected number to be graduated is 15.99. The actual and 
adjusted frequencies shown in Table 157 differ but little. Unless differ- 
ences in subgroups are large and the control factors unusually effective, 
there appears to be little advantage in making the effort to compute 
expected frequencies. In any situation the computation of expected 
numbers is open to serious question whenever differences are found to be 
nonsignificant. 

The same word of warning needs to be stated here, which has been 
pointed out in both analysis of variance and covariance, that dispropor- 
tionality in multiple classification should be avoided. Disproportionality 
in the situation here described was overcome by sampling down to 18 
students in each of the subgroups by using a table of random numbers. 

Discriminant analysis as here used with groups in which a dichoto- 
mous criterion is available can be extended to situations in which the 
criterion has been segmented into more than two groups. There should 
be no difficulty in making the necessary computations from the dis- 
cussion in the chapter dealing with serial correlation and discriminant 
analysis. As the number of segments in the criterion is increased, the 
nearer will the obtained results approach the results from the usual co- 
variance analysis. It is here suggested that the usual covariance technique 
be applied whenever the criterion consists of six or more segments. It may 
be that the foregoing generalization will need to be revised as more expe- 


rience with discriminant analysis is accumulated. 


Exercises 


1. The personnel manager of a large commercial organization was confronted 
with the problem of finding suitable replacements for clerks and the computers 
employed in the central office. In order to determine which of the two positions 
would be most suitable for a promising applicant, he gathered the following in- 
formation from presently employed clerks and computers found to be competent. 


386 STATISTICAL. METHODS 


CLERKS COMPUTERS 
VARIABLE (k = 69) (% = 106) 
Clerical Interest Score EX; 3821 6121 
Scholastic Aptitude Score KK 5196 8063 
Clerical Aptitude Score ZX; 2877 3849 
ZX? = 591,022 ZX,Xa = 778,297 
2X2 = 1,098,973 EXX; = 388,945 
ZXI = 269,992 EX2X3 = 516,867 


a. Find the discriminant function which yields the weights to be assigned to 
the foregoing three numerical variable so as to produce maximum separa- 
tion between the two types of clerical employees. 

b. Test the significance of the discriminant function by means of F-test. 
Interpret the results. 

c. Determine the necessary critical scores which will allow the personnel 
manager to determine whether a given applicant’s pattern of scores more 
closely resembles that of a clerk or a computer. 

d. Demonstrate whether it is possible to drop the clerical interest scores 
without significantly lessening the prediction efficiency of the discriminant 
function. 

2. In an effort to obtain information helpful to college seniors who were major- 
ing in agriculture education and who must decide between extension teaching 
and secondary school teaching as a career, extension teachers and secondary 
school teachers in agriculture with at least five years of experience in their re- 
spective areas were studied. All members of the sample were graduates of the 
same university. The results are summarized in the following table. Note that 
the higher the cadet teaching rating, the better potential teacher was the candi- 
date in the opinion of the rater ; also, the higher the attitude score, the more 
favorable was the attitude toward the teaching profession. 


—_—_——, 


EXTENSION SCHOOL 
TEACHERS TEACHERS 
VARIABLE (k = 35) (k = 35) 
adet sal rating EX; 132 123 
college average (per cent) EX: 285 
Attitude toward Teaching ? 3 Ki 
Score 2X, 1236 1139 
ZX = 1,027 2X,X2 = 21,043 
ZXË = 407,041 2XiX3 = 8973 
2X§ = 85,801 ZXoXg = 196,347 


SSS ee, 


a. Find the point biserial 
variable and each of th 
b. Use all three numerical 


correlation coefficients between the dichotomous 
e three numerical variables. 


variables in computing a discriminant function. 


FURTHER APPLICATIONS OF DISCRIMINANT ANALYSIS 387 


c. What evidence can you find that the pattern of scores for the extension 
teachers differs from that of the secondary school teachers in agriculture? 

3. The sales division of an industrial corporation annually screened large num- 
bers of college graduates who were applying for positions. All candidates were 
enrolled in intensive four month training classes. Later an investigator attempted 
to study the influence of the type of college preparation on tendency to survive 
the training period. The classification of the candidates according to college 
training and according to survival-attrition is shown below. Scholastic aptitude 
scores were designated as X,, and college achievement ratings were designated 
as X.. The higher the achievement rating, the greater the college achievement. 


el 


COLLEGE MAJORS SURVIVAL ATTRITION 

Engineering k 185 79 
EX; 20,749 8,123 
KK 373 114 
Natural Science k 116 72 
EX; 11,763 6,976 
2X2 200 94 
Social Science k 217 171 
EX; 21,810 15,610 
2X: 395 214 
Miscellaneous k 88 69 
EX, 9,048 6,329 
ZE: 167 77 

10,683,622 

,702 

173,916 


a. Ignoring the control variables, test whether candidates with the various 
college backgrounds differ in their tendency to survive. Interpret the 


results. : 
b. Including the control variables test the same hypothesis as the forego- 


ing. Interpret the results. 


c. Adjust the attrition-survival ratios for bias in student aptitude as re- 


vealed by the scholastic aptitude scores and the achievement ratings. 

4. In a study of graduation-attrition among transfer students, a college of 
engineering subdivided the sample on the basis of whether the students were 
veterans of World War II and whether the students first matriculated at a two- 
year college or a four-year college. The following data resulted. The quantitative 
score on a scholastic aptitude test was designated as X,, and the high school 


grade-point average was designated as Xs. 


388 STATISTICAL METHODS 


VETERANS OF WORLD WAR IL 


COLLEGE OF 
SE YES NO 
MATRICULATION GRADUATION ATTRITION GRADUATION ATTRITION 

k 19 45 12 52 

Two-year EX; 1385 2416 901 3180 
EX2 53.03 110.88 38.38 132.72 

k 28 36 17 47 

Four-year EX, 2004 2177 1133 2935 
2X: 79.96 90.57 48.95 123.93 


ZA = 1,095,204 
ZX3 = 1866.3477 
EZXıX2 = 43,512.49 


a. Controlling on scholastic aptitude as represented by the quantitative 
score and on high school grade-point averages, test (1) whether transfer 
students who are veterans differ from those who are nonveterans in tend- 
ency to graduate from the engineering college, (2) whether transfer stu- 
dents who first matriculated at two-year colleges differ from those who 
first matriculated at four-year colleges in tendency to graduate from the 
engineering college, and (3) whether there is any interaction between 
veteran status and college status with respect to tendency to graduate 
from the engineering college. Interpret your results. 

b. Adjust the graduation-attrition frequencies for differences in quantitative 
score and high school grade-point averages for all comparisons yielding 


significance, 


APPENDIX A 


Determination of Regression 


Coefficients 


First-term grade-point averages (Y) of engineering college freshmen are to be 
predicted from scientific interest scores (X,), scholastic aptitude scores (X»), high 
school grade-point averages (Xs), and mechanical aptitude scores (Xy), The data 
for a sample of 260 engineering freshmen are summarized in the following table. 


Ine A 


N = 260 DX, = 14,790 2X; = 669.71 

SY = 599.07 2X, = 28,420 EX, = 10,261 

RAW DEVIATION RAW DEVIATION 
SYMBOL SCORE SCORE SYMBOL SCORE SCORE 
ZY, 1,574 193.85890 X,Y 24,283.68 641.15204 
rX? 891,222 49,898.539 2X, X: 1,595,663 —20,997.769 
zx? 3,220,842 114,317.38 Zu 37,833.54 — 262.65576 
2X3 1,820.7997 95.755531 2X, X4 580,595 —3,098.0385 
2X? 420,303 15,348.689 2X2X3 75,007.81 1,803.3554 
DEN 33,657.77 —420.09654 ZXoX4 1,136.657 15,050.769 
Sir 68,065.81 2,582.8508 Zait: 26,763.26 332.89727 


Mernop I: Solution of Simultaneous Equations 
Normal equations for deviation score solution: 


Zay = oi Loan + ag ënn + a422,%4 
Lay = abate + aja + as Tts + TEEN 
Day = gënnen + as aj + alza 
Day = Etits + aaats + 05 Zen + ari 


Substitute the deviation values in the normal equations: 
(1) —420.09654 = 49898.539a, — 20997.769a2 — 262.65576a3 — 3098.0385a4 


(2) 2582.8508 = —20997.769a, + 114317.8802 + 1803.3554as + 15050.76904 

(8) 64.343686 = —962.65576m + 1803.3554a2 + 95.755531a3 + 3328972704 

(4) 641.15204 = —3098.0385a, + 15050.769a2 + 332.89727as + 15848.68904 
389 


390 APPENDIX A 


Divide equations (1), (2), (8), and (4) by their respective coefficients of as: 


(5) —0.13560081 = 16.106494a, — 6.7777624a2 — 0.084781309a3 — a, 
(6) 0.17160922 = —1.3951293a, + 7.5954511a2 + 0.11981816a; +a 
(7) 0.19328391 = —0.78899944a, + 5.4171529a2 + 0.28764288a3 +a 


(8) 0.041772430 = —0.20184385a, + 0.98058987a2 + 0.021688971a3 + as 
Subtract (or add if the signs of a, are unlike) (8) from (5), (6), and (7) successively : 
(9) —0.093828380 = 15.904650a, — 5.7971725a2 — 0.063092338a% 


(10) 0.12983679 = —1.1932854a, + 6.6148612a, + 0,098129195a3 
(11) (0.15151148 = —0.58715559a, + 4.4365630a, + 0.26595391a3 
Divide equations (9), (10), and (11) by their respective coefficients of as: 
(12) —1.4871597 =  252.085285a, — 91.883938a: — a, 

(13) 1.8231209 = —12.160350a, + 67.409716a2 + ag 


(14)  0.56969074 —2.2077344a, + 16.681699a, + a 


Subtract (or add if the signs of as are unlike) (14) from (12) and (13) successively: 
(15) —0.91746892 = 249.87755a, — 75.202239a, 
(16)  0.75343016 = —9.9526160a, + 50.728017a, 
Divide equations (15) and (16) by their respective coefficients of as: 
(17) —0.012200021 = 3.3227408a, — az 
(18) 0.014852641 = —0.19619951a; + a, 
Add equations (17) and (18): 
(19) 0.002652620 — 3.1265413a, 
Solving: aj = 0,00084841995 
Substitute the value of a in equation (18): 
0.014852641 = (—0.19619951)(0.00084841995) ta: 
Solving: a = 0.015019101 
Substitute values of a and az in equation (14): 
0.56969074 = (—2.2077344) (0,00084841995) + (16.681699) (0.015019101) + as 
Solving: az = 0.32101971 
Substitute values of Ai, 42, and ag in equation (4) and solve: 
a, = 0.020253514 
Check the values for Qi, 42, az, and ay for accuracy 


tion (1). An identity should be obtained. 
The regression equation in deviation form is: 


y = 0.00084841995x, + 0.015019101x, 


by substituting them in equa- 


+ 0.32101971x, + 0.020253514x, 


(3) and (4). The values in the check column are 


de 


DETERMINATION OF REGRESSION COEFFICIENTS 391 
obtained by adding 1.0000 to the sum of all the correlation coefficients in which a 
certain X-variable is involved. For example, 
line (1): 1.0000 + (—0.2780 — 0.1202 — 0.1119 — 0.1351) = 0.3548 
line (2): 1.0000 + (—0.2780 + 0.5451 + 0.3593 + 0.5487) = 2.1751, ete. 


LINE xX X: X: Au Y CHECK 
(1) Xx 1.0000 —0.2780 —0.1202 —0.1119 —0.1351 0.3548 
(2) X: 1.0000 0.5451 0.3593 0.5487 2.1751 
(3) Xs 1.0000 0.2746 0.4723 2.1718 
(4) Xy 1,0000 0.3717 1.8937 


Line (1) is recopied in line (5). 
Line (6) is line (5) divided by the first value in the line, 1.0000. 


LINE Ké X: X: X; S CHECK 
(5) 1.0000 —0,2780 —0.1202 —0.1119 —0.1351 0.3548 
(6) 1.0000 —0.2780 —0.1202 —0.1119 —0.1351 0.3548 


Line (7) is obtained by successively multiplying all values to the right of 1.0000 
in line (5) by the value (0.2780) in the X:-column of line (6), and subtracting 
each product from the value found in the corresponding column in line (2). For 


example, 
1.0000 — (—0.2780)(—0.2780) = 0.9227 
0.5451 — (—0.2780)(—0.1202) = 0.5117, ete. 


Line (8) is line (7) divided by the first value in the line, 0.9227. 
Check the accuracy of the values in each line by determining whether their sum 


equals the check value of that line. 


LINE Eé Xa Xs Xs CHECK 
(7) 0.9227 0.5117 0.3282 0.5111 2.2737 
(8) 1.0000 0.5546 0.3557 0.5539 2.4642 


Line (9) is obtained by performing three steps. First, successively multiply all 
values £ Ze right of 0.9227 in line (7) by the value (0.5546) in the Xg-column of 
line (8). Then add each of these products to the corresponding product obtained 
by successively multiplying all values to the right of —0.2780 in line (5) by the 
value (—0.1202) in the Xg-column of line (6). Lastly, subtract each sum of the 
two products from the corresponding column value in line (3). For example, 

1.0000 — [(0.5546) (0.5117) + (—0.1202) (—0.1202)) = 0.7018 

0.2746 — [(0.5546) (0.3282) + (—0.1202)(—0.1119)] = 0.0791, ete. 


Line (10) is line (9) divided by the first value in the line, 0.7018. 
Check the accuracy of the values by addition. 


392 APPENDIX A 


LINE Xı X: X: Xa ¥ CHECK 
(9) 0.7018 0.0791 0.1726 0.9535 
(10) 1.0000 0.1127 0.2459 1.3586 


Line (11) is obtained by performing four steps. First, successively multiply all 
values to the right of 0.7018 in line (9) by the value (0.1127) in the X,-column of 
line (10). Secondly, successively multiply all values to the right of 0.5117 in line (7) 
by the value (0.3557) in the X,-column of line (8). Thirdly, successively multiply 
all values to the right of —0.1202 in line (5) by the value (—0.1119) in the X-col- 
umn of line (6). Lastly, add the appropriate products and subtract each sum from 
the value found in the corresponding column in line (4). For example, 


1.0000 — [(0.1127) (0.0791) + (0.3557) (0.3282) + (—0.1119)(—0.1119)] = 0.8618 


0.3717 — [(0.1127) (0.1726) + (0.3557) (0.5111) + (—0.1119)(—0.1351)] = 0.1553, 


etc. 
Line (12) is line (11) divided by the first value in the line, 0.8618. 
Check the accuracy of the values by addition, 


LINE xX Ke X: Ké SC CHECK 
(11) 0.8618 0.1553 1.0171 
(12) 1,0000 0.1802 1,1802 
(13) 0.0136 0.3647 0.2256 0.1802 


Po a aN 


The Yç-column value in line (13) is equal to the Y-column value in line (12). The 


remaining three values in line (13) are found by subtracting from the Y-column ` 


values in lines (10), (8), 
values in that line (exclud: 
ing values in line (13). 

0.2459 — (0.1127)(0.1802) = 0.2256 

0.5539 — [(0.3557)(0.1802) + (0.5546) (0.2256)] = 0.3647 

—0.1351 — [(—0.1119) (0.1802) + (—0.1202) (0.2256) + (—0.2780) (0.3647)] 


= 0.0136 
The regression coefficient for each X-variable is obtained by multiplying the 


appropriate value in line (13) by 


and (6) the sum of all products between the remaining 
ing 1.0000 and the check column value) and correspond- 


Zy? 

TË For example, 
193.85890 

a = 0.01364 [EE = 

d 49,898.539 0.0008477 
193.85890 

= 0.3647, /———— _ 
a = 0. Ve 0.01502, ete. 


Regression equation in deviation form: 


y = 0.0008477x, + 0.015022, + 0.321023 + 0.020252, 


APPENDIX B 


Tables 


Designation Title 
I Squares, Square Roots, and Reciprocals 
II Ordinates and Areas of the Normal Curve 
(In terms of o-units from mean) 
UI Ordinates and o-units of the Normal Curve 
(In terms of p) 
2 
IV Table of PË and E 
z PQ 
Y Table of Random Numbers 
VI Table of £ 
VII The 5 and 1 Per Cent Values of F 
VIII Table of Chi Square 
IX Values of at the 5 and 1 Per Cent Levels of Significance 
X Values of Z Corresponding to Values ofr 
XI Tetrachoric Correlation from the Phi Coefficient 
XII Correction Factors for Coarse Grouping 


393 


394 APPENDIX B 


TABLE I. Squares, Square Roots, and Reciprocals 


1 1 
N As N = = 
VN N N 
| pois 
1 1 1 .020000 
2 4 1 -019608 
3 9 1 .019231 
4 16 2 .018868 
-018519 
5 25 2.2261 .200000 .018182 
6 36 2.4495 | .166667 .017857 
7 49 2.6458 j .142857 -017544 
8 64 2.8284 -125000 “017241 
9 81 3.0000 | .111111 .016949 
10 100 3.1623 | .100000 .016607 
1 121 3.3166 | .090909 -016393 
12 144 3.4641 -083333 -016129 
13 169 3.6056 | .076923 .015873 
14 196 3.7417 | .071429 .015025 
15 225 3.8730 | .066667 -015385 
16 256 4.0000 | .062500 015152 
17 289 4.1231 .058824 .014925 
18 324 4.2426 | .055556 -014706 
19 361 4.3589 | .052632 .014493 
20 400 4.4721 | .050000 .014288 
21 441 4.5826 | .047619 .014085 
22 484 4.6904 | .045455 .013889 
23 529 4.7958 | .043478 +013699 
24 576 4.8990 | .041667 .013514 
25 025 5.0000 | .040000 .013333 
26 676 5.0990 | .038462 .013158 
27 729 5.1962 | .037037 .012987 
28 784 5.2915 | .035714 .012821 
29 841 5.3852 | .034483 -012658 
30 900 5.4772 | .033333 
31 961 5.5678 | .032258 eri 
32 1024 5.6569 | .031250 012195 
= 1089 5.7446 | 030303 012048 
3 1156 5.8310 | .029412 -011905 
35 1225 5.9161 .028571 
36 1296 6.0000 | .027778 KEEN 
37 1369 6.0828 | 027027 1011494 
38 1444 6.1644 | .026316 1011364 
39 ` 
1521 6.2450 | .025641 -011236 
40 1600 6.3246 | .025000 
41 1681 6.4031 | .024390 we 
42 1764 6.4807 | .023810 erg 
E 1849 6.5574 | .023256 “010768 
4 1936 6.63 $ 
32 | 022727 “010638 
45 2025 6.7082 | .022222 
46 2116 6.7823 -021739 5010628 
47 2209 6.8557 -021277 SH 
48 2304 6.9282 -020833 Ke 
49 2401 7,0000 020408 SE 


-010101 


TABLES 


TABLE I (continued) 


Y 
VN Se 
10.0000 | .010000 
10.0499 .009901 
10.0995 .009804 
10.1489 | .009709 
10.1980 | .009615 
10.2470 | .009524 
10.2956 .009434 
10.3441 .009346 
10.3923 .009259 
10.4403 | .009174 
10.4881 .009091 
10.5357 .009009 
10.5830 | .008929 
10.6301 .008850 
10.6771 .008772 
10.7238 | .008696 
10.7703 -008621 
10.8167 008547 
10.8628 -008475 
10.9087 -008403 
10.9545 -008333 
11.0000 -008264 
11.0454 -008197 
11.0905 .008130 
11.1355 | .008065 
11.1803 | .008000 
11.2250 -007937 
11.2694 007874 
11.3137 -007813 
11.3578 | .007752 
11.4018 -007692 
11.4455. | -007634 
11.4891 .007576 
11.5326 | -007519 
11.5758 | .007463 
11.6190 | -007407 
11.6619 | .007353 
11.7047 | .007299 
11.7473 | -007246 
11.7898 | .007194 
11.8322 007143 
11.8743 | -007092 
11.9164 | -007042 
11.9583 | .006993 
12.0000 | .006944 
12.0416 | .006897 
12.0830 | 096849 
12.1244 .006803 
12.1655 | .006757 
12.2066 | .006711 


12.2474 
12.2882 
12.3288 
12.3693 
12.4097 


12.4499 
12.4900 
12.5300 
12.5698 
12.6095 


12.6491 
12.6886 
12.7279 
12.7671 
12.8062 


12.8452 
12.8841 
12.9228 
12,9615 
13.0000 


13.0384 
13.0767 
13.1149 
13.1529 
13.1909 


13.2288 
13.2665 
13.3041 
13.3417 
13.3791 


13.4164 
13.4536 
13.4907 
13.5277 
13.5647 


13.6015 
13.6382 
13.6748 
13.7113 
13.7477 


13.7840 
13.8203 
13.8564 
13.8924 
13.9284 


13.9642 
14.0000 
14.0357 
14.0712 
14.1067 


-006667 
-006623 
-008579 
-006536 
-006494 


-006452 
-006410 
-006369 
-006329 
006289 


-006250 
.006211 
-006173 
-006135 
-006098 


-006061 
-006024 
-005988 
.005952 
+005917 


005882 
.005848 
005814 
-005780 
-005747 


-005714 
.005682 
-005650 
.005618 
+005587 


-005556 
.005525 
.005495 
-005464 
.005435 


.005405 
.005376 
.005348 
.005319 
.005291 


.005263 
-005236 
.005208 
.005181 
-005155 


.005128 
-005102 
-005076 
.005051 
.005025 


395 


396 


-1774 
-2127 
-2478 
-2829 


-3178 
-3527 
-3875 
-4222 
-4568 


4914 
-5258 
-5602 
-5945 
-6287 


-6629 
-6969 
-7309 
-7648 
-7986 


-8324 
.8061 
-8997 
-9332 
-9666 


15.3297 
15.3623 
15.3948 
15.4272 
15.4596 


15.4919 
15.5242 
15.5563 
15.5885 
15.6205 


15.6525 
15.6844 
15.7162 
15.7480 
16.7797 


APPENDIX B 


TABLE I (continued) 


«005000 
-004975 
-004950 
-004926 
-004902 


-004878 
.004854 
-004831 
-004808 
-004785 


-004762 
-004739 
-004717 
-004695 
-004673 


-004651 
-004630 
-004608 
-004587 
-004566 


-004545 
-004525 
-004505 
-004434 
-004404 


-004444 
004425 
-004405 
-004386 
-004367 


-004348 
-004329 
-004310 
004292 
-004274 


-004255 
-004237 
-004219 
-004202 
-004184 


“004167 
-004149 
-004132 
-004115 
-004098 


«004082 
“004065 
-004049 
“004032 
-004016 


17.1464 


17.1756 
17.2047 
17.2337 
17.2027 
17.2916 


-004000 
-003984 
“003608 
-003953 
-003937 


-003922 
-003906 
-003891 
-003876 
-003861 


-003846 
-003831 
-003817 
-003802 
-003788 


-003774 
-003759 
-003745 
003731 
-003717 


003704 
-003690 
-003876 
-0038663 
-003650 


-003338 
-003023 
-003610 
-003597 
003584 


-003571 
-003559 
-003546 
-003534 
«003521 


-003509 
-003497 
-003484 
-003472 
-003460 


-003448 
«003436 
-003425 
-003413 
-003401 


-003390 
-003378 
-003367 
-003356 
003344 


TABLES 397 


Tasty I (continued) 


350 122500 18.7083 .002857 


300 90000 17.3205 .003333 
301 90001 17.3494 .003322 | 351 123201 18.7350 “002849 
302 91204 17.3781 003311 | 352 123904 18.7617 “002841 
303 91809 17.4069 .003300 | 353 124609 18.7883 "002833 
304 92416 17.4356 .003289 | 364] 125316 18.8149 "002825 
305 93025 17.4642 .003279 | 355 126025 | 18.8414 .002817 
306 93636 17.4929 .003268 | 358 126736 18.8680 .002809 
307 94249 17.5214 .003257 | 357 127449 18. 8944 .002891 
308 94864 17. 5499 .003247 | 358 128164 18.9209 002793 
309 95481 17.5784 .003238 | 359 128881 18.9473 .002786 
310 96100 17.6068 003226 | 360 129600 18.9737 .002778 
311 96721 17.6352 .003215 | 361 130321 19.0000 .002770 
312 97344 17.6635 .003205 | 362 131044 19.0263 .002762 
313 97969 17.6918 .003195 | 363 131769 | 19.0528 .002755 
314 98596 17.7200 .003185 | 364 132498 19.0788 002747 
315 99225 17.7482 .003175 (365) 133225 19.1050 .002740 
310 99856 17.7764 603165 | 366 133956 19.1311 .002732 
317 100489 17.8045 .003155 | 367 134689 19.1572 .002725 
318 101124 17.8326 .003145 | 368 135424 19.1833 .002717 
319 101761 17. 8606 .003135 j 369 130161 19.2094 .002710 
320 102400 7.8885 .003125 | 370 136900 | 19.2354 .002703 
321 103041 17.9165 003115 | 371 137641 19.2614 .002695 
322 103084 17.9444 003108 | 372 138384 19.2873 .002688 
32 104329 17.9722 .003093 | 373 139129 | 19.3132 .002051 
324 104976 18.0000 .003088 | 374 139876 | 19.3391 002674 
325 105625 18.0278 .003077 | 375 140625 | 19.3649 .002687 
326 106276 18.0555 .003067 | 37 14137 19.3907 .002680 
327 106929 18.0831 .003058 | 377 142129 19.4105 .002653 
328 107584 18.1108 .063049 | 378 | 142884 19.4422 002646 
329 108241 18.1384 .003040 | 379 | 143641 10.4679 .002639 
330 108900 18.1659 .003030 | 380 144400 | 19.4936 .092632 
331 109561 18.1934 .003021 j 381 145161 19.5192 .002625 
332 110224 18.2209 .003012 | 382 145924 19.5448 .002618 
333 110889 18.2483 .003003 | 383 146689 19.5704 .002611 
334 111556 18.2757 .002994 | 384 147456 | 19.5959 002604 
335 112225 18.3020 .002985 | 385 148225 19.6214 ,002597 
336 112896 18.3303 002976 į 386 148996 19.6469 002591 
337 113569 18.3576 .002967 | 387 149769 19.6723 002584 
338 114244 18.3848 .002959 | 383 150544 19.6977 .002577 
339 114921 18.4120 .002950 | 389 151321 19.7231 .002571 
340 115600 18.4391 .002941 | 390 152100 19.7484: .002564 
341 116281 .002933 | 391 152881 19.7737 .002558 
342 116964 .002524 | 392 153064 19.7990 002551 
343 117649 .002915 | 393 154449 19.8242 ,002545 
344 118336 .002907 | 394 155236 19.8494 .002538 

5 .002899 j 395 156025 19.8746 002532 
Bid ene 002890 | 396 156816 19.8997 -002525 
347 120409 .002382 | 397 157609 19.9249 .002519 
348 121104 .002874 | 398 158404 19.9499 002513 
349 121801 002865 | 399 159201 002508 


398 APPENDIX B 


Taste I ( continued) 


1 
N 


400 | 180000 | 20.0000 | .002500 202500 | 21.2132 | .002222 
401 | 160801 | 20.0250 | .002494 203401 | 21.2368 | .002217 
402) 161604 | 20.0499 | .002488 204304 | 21.2603 | .002212 
403 | 182409 | 20.0749 | .002481 205209 | 21.2838 | .002208 
404) 163216 | 20.0998 | .002%75 200116 | 21.3073 | .002203 
405| 164025 | 20.1246 | .002469 207025 | 21.3307 | .002198 
406 | 164836 | 20.1494 | .002463 207936 | 21.3542 | 002193 
407 | 165649 | 20.1742 | .002457 208849 | 21.3776 | .002188 
408| 168464 | 20.1990 | .002451 209764 | 21.4009 | .002183 
409 | 167281 | 20.2237 | .002445 210681 | 21.4243 | ‘oo2179 
410) 108100 | 20.2485 | .002439 211600 | 21.4476 | .002174 
411) 168921 | 20.2731 | .002433 212521 | 21.4709 | .002169 
412) 169744 | 20.2078 | .002427 213444 | 21.4942 | .002165 
413| 170569 | 20.3224 | -002421 214369 | 21.5174 | .002160 
414 | 171396 | 20.3470 | .002415 215208 | 21.5407 | 002155 
415 | 172225 | 20.3715 | .002410 216225 | 21.5630 | .002151 
416 | 173056 | 20.3961 | .002404 217156 | 21.5870 | 002148 
417 | 173889 | 20.4208 | .002398 218089 | 21.6102 | .002141 
418| 174724 | 20.4450 | 002392 219024 | 21.6333 | .002137 
419 | 175561 | 20.4695 | .002387 219961 | 21.0564 | 002132 
420 | 176400 | 20.4939 | .002381 220900 | 21.6705 | .002128 
421 | 177241 | 20.5183 | .002375 221841 | 21.7025 | .002123 
422 | 178084 | 20.5428 | .002370 222784 | 21.7256 | .002119 
423 | 178929 | 20.5670 | .002364 223729 | 21.7486 | .002114 
424 | 179776 | 20.5913 | .002358 224676 | 21.7715 | .002110 
425 | 180625 -6155 | .002353 225625 | 21.7945 | .002105 
426 | 181476 | 20.6398 | .002347 226576 | 21.8174 | .002101 
427 | 182329 | 20.6640 | .002342 227529 | 21.8403 | 002006 
428| 183184 | 20.6882 | .002336 228484 | 21.8632 | .002092 
429| 184041 | 20.7123 | 002331 229441 | 21.8861 | .002088 
430| 184900 | 20.7364 | 002326 230400 | 21.9089 | .002083 
431| 185761 | 20.7605 | 002320 231361 | 21.9317 | .002079 
432 | 186624 | 20.7846 | .002315 232324 | 21.9545 | .002075 
43 | 187489 | 20.8087 | “002309 233289 | 21.9773 | “002070 
434 | 188356 | 20.8327 | .002304 234258 | 22.0000 | .002086 
435 | 180225 | 20.8567 | .002299 235225 | 22.0227 | .002082 
436 | 190096 | 20.8808 | 002204 236198 | 22.0454 | 002058 
437 | 190969 | 20.9045 | 00288 237169 | 22.0681 | 002053 
438 | 101844 | 20.9284 | 002283 238144 | 22.0907 | “002019 
439 | 192721 | 20.9523 | 002278 239121 | 22.1133 | .002045 
4401 193600 | 20.9762 | .002273 240100 | 22.1359 | 007041 
441| 194481 | 21.0000 | 002068 241081 | 22.1585 | “002037 
442 | 195364 | 21.0238 | “002252 242064 | 22.1811 | 002033 
443 | 196249 | 21.0476 | 002257 243049 | 22.2036 | oos 
444) 197136 | 21.0713 | “002250 244036 | 22.2261 | “002094 
445) 198025 | 21.0950 | .002247 245025 | 22.2486 | .002020 
446) 198916 | 21.1187 | .002242 246016 "| 22.2711 | 1002018 
447 | 199809 | 21.1424 | “002237 247009 | 22.2035 | 007012 
448 | 200704 | 21.1660 | “002032 245004: | 22:3150 | -4002005 
449 | 201601 .002227 


249001 


“002004 


TABLES 


TABLE I (continued) 


250000 
251001 
252004 
253009 
254016 


255025 
256036 
257049 
258064 
259081 


260100 
251121 
262144 
203169 
264196 


265225 
226256 
267289 
268324 
269361 


270400 
271441 
272484 
273529 
274576 


275625 
276676 
277729 
278784 
279841 


280900 
281961 
283024 
284089 
285156 


286225 
287296 
288369 
289444 
290521 


291600 
292681 
293764 
294849 
295936 


297025 
298116 
299209 
300304 
301401 


.002000 
001996 
-001992 
-001988 
-001984 


-001980 
-001976 
-001972 
-001969 
.001965 


.001961 
-001957 
.001953 
-001949 
-001946 


.001942 
.001938 
.001934 
.001931 
.001927 


.001923 
.001919 
.001916 
.001912 
-001908 


-001905 
-001901 
.001898 
.001894 
.001890 


.001887 
.001883 
.001880 
.001876 
.001873 


.001869 
.001866 
.001862 
.001859 
.001855 


.001852 
.001848 
.001845 
.001842 
.001838 


.001835 
.001832 
.001828 
.001825 
¿001821 


302500 
303601 
304704 
305809 
306916 


308025 
309136 
310249 
311364 
312481 


313600 
314721 
315844 
316969 
318096 


319225 
320356 
321489 
322624 
323761 


324900 
326041 
327184 
328329 
329476 


330625 
331776 
332929 
334084 
335241 


336400 
337561 
338724 
339889 
341056 


342225 
343396 
344569 
345744 
346921 


348100 
349281 
350464 
351649 
352836 


354025 
355216 
356409 
357604 
358801 


-001818 
-001815 


-001812 


-001808 


-001805 


-001802 


-001799 
001795 


-001792 
-001789 


«001786 
-001783 
-001779 
-001776 
-001773 


.001770 
+001767 
-001764 
-001761 
-001757 


.001754 
-001751 
-001748 
001745 
001742 


001739 
„001736 
.001733 
-001730 
-001727 


.001724 
.001721 
.001718 
001715 
001712 


.001709 
-001706 
-001704 
-001701 
-001698 


-001695 
-001692 
.001689 
.001686 
001684 


.001681 
+001678 
.001675 
-001672 
«001669 


399 


400 


Na 


360000 
361201 
362404 
363609 
364816 


366025 
367236 
368449 
369064 
370881 


372100 
373321 
874544 
375769 
376996 


378225 
379456 
380689 
381924 
383161 


384400 
385641 
386884 
388129 
389376 


390625 
391876 
393129 
394384 
395641 


396900 
398161 
399424 
400689 
401956 


403225 
404496 
405769 
407044 
408321 


409600 
410881 
412164 
413449 
414736 


416025 
417316 
418609 
419904 
421201 


25.1992 
25.2190 
25.2389 
25.2587 
25.2784 


25.2982 
25.3180 
25.3377 
25.3574 
25.3772 


25.3969 
25.4165 
25,4362 
25.4558 
25.4755 


APPENDIX B 


Taste I (continued) 


-001667 
-001664 
-001661 
-001658 
-001656 


-001653 
-001650 
-001647 
-001645 
-001642 


-0016239 
“001637 
“001634 
“001631 
-001629 


-001626 
-001623 
.001621 
001618 
-001616 


-001613 
-001610 
-001608 
-001605 
-001603 


-001600 
-001597 
-001595 
-001592 
-001590 


-001587 
-001585 
-001582 
-001580 
-001577 


-001575 
-001572 
-001570 
-001567 
-001565 


-001563 
-001560 
-001558 
-001555 
-001553 


-001550 
-001548 
-001546 
“001543 
-001541 


422500 
423801 
425104 
426409 
427716 


429025 
430336 
431649 
432964 
434281 


435600 
436921 
438244 
439569 
440896 


442225 
443556 
444889 
446224 
447561 


448900 
450241 
451584 
452929 
454276 


455625 
456976 
458329 
459684 
461041 


462400 
463761 
465124 
466489 
467856 


469225 
470596 
471969 
473344 
474721 


476100 
477481 
478864 
480249 
481636 


483025 
484416 
485809 
487204 
488601 


VN 


25.4051 
5147 
5343 
5539 
.5734 


5930 
-6125 
-6320 
-6515 
-6710 


-6905 


a 
N 


-001538 
-001536 
-001534 
«001531 
-001529 


-001527 
-001524 
-001522 
«001520 
-001517 


-001515 
-001513 
-001511 
-001508 
-001508 


-001504 
-001502 
-001499 
-001497 
-001495 


.001493 
-001490 
-001488 
.001486 
-001484 


.001481 
-001479 
-001477 
.001475 
-001473 


-001471 
-001468 
-001466 
-001464 
+001462 


-001460 
.001458 
-001456 
-001453 
.001451 


-001449 
-001447 
-001445 
-001443 
-001441 


-001439 
-001437 
-001435 
-001433 
-001431 


710 
711 
712 
713 
714 


715 
716 
717 
718 
719 


720 
721 
722 
723 
724 


725 
726 
727 
728 
729 


730 
731 
732 
733 
734 


735 
736 
737 
738 
739 


740 
741 
742 
743 
744 


745 
746 
747 
748 


N? 


490000 
491401 
492804 
494209 
495616 


497025 
498436 
499849 
501264 
502681 


504100 
505521 
506944 
508369 
509796 


511225 
512656 
514089 
515524 
516961 


518400 
519841 
521284 
522729 
524176 


525625 
527076 
528529 
529984 
531441 


-532900 
534361 
535824 
537289 
538756 


540225 
541696 
543169 
544644 
546121 


547600 
549081 
550564 
552049 
553536 


555025 
556516 
558009 
559504 


26 
26 


26. 
26. 
.5330 


.5518 
.5707 
.5895 
.6083 
.6271 


.6458 
-6646 
-6833 
7021 
7208 


7395 
7582 
.7769 
.7955 
.8142 


.8328 
.8514 
.8701 
.8887 


4575 
A764 


4953 
5141 


.9072 


.9258 
9444 
.9629 
.9815 
.0000 


.0185 
.0370 
.0555 
0740 
.0924 


1109 
1293 
1477 
1662 
1846 


.2029 
.2213 
.2397 
.2580 
.2764 


.2947 
.3130 
.3313 
.3496 


TABLES 


TABLE I (continued) 


.001429 
001427 
001425 
001422 
.001420 


.001418 
.001416 
.001414 
.001412 
.001410 


.001408 
.001406 
.001404 
.001403 
«001401 


.001399 
.001397 
.001395 
.001393 
.001391 


.001389 
.001387 
„001385 
-001383 
001381 


.001379 
„001377 
.001376 
.001374 
.001372 


,001370 
,001368 
.001366 
.001364 
.001362 


.001361 
001359 
-001357 
-001355 
1001353 


.001351 
.001350 
.001348 
.001346 
¿001344 


.001342 
.001340 
.001339 
.001337 


N? 


562500 
564001 
565504 
567009 
568516 


570025 
571536 
573049 
574564 
576081 


577600 
579121 
580644 
582169 
583696 


585225 
586756 
588289 
589824 
591361 


592900 
594441 
595984 
597529 
599076 


600625 
602176 
603729 
605284 
606841 


608400 
609961 
611524 
613089 
614656 


616225 
617796 
619369 
620944 
622521 


624100 
625681 
627264 
628849 
630436 


632025 
633616 
635209 
636804 
638401 


VN 


27 
27 
27 
27 
27 


27 
27 


28 


3861 
4044 
+4226 
-4408 
-4591 


4773 
.4955 
.5136 
.5318 
.5500 


-5681 
- 5862 
-6043 
6225 
.6405 


.6586 
6767 
6948 
7128 
.7308 


.7489 
.7669 
.7849 
.8029 
.8209 


.8388 
-8568 
. 8747 
.8927 
.9106 


.9285 
.9464 
E 
9821 
-0000 


-0179 
.0357 
.0535 
-0713 
-0891 


.1069 
. 1247 
1425 
.1603 
.1780 


.1957 
.2135 
.2312 
.2489 
.2666 


EN 
D 


+001333 
.001332 
-001330 
-001328 
-001326 


.001325 
001323 
-001321 
-001319 
-001318 


-001316 
-001314 
«001312 
-001311 
-001309 


-001307 
«001305 
“001304 
-001302 
-001300 


-001299 
-001297 
-001295 
-001294 
-001292 


.001290 
-001289 
.001287 
.001285 
001284 


-001282 
.001280 
-001279 
-001277 
«001276 


-001274 
001272 
-001271 
.001269 
.001267 


.001266 
-001264 
-001263 
-001261 
-001259 


.001258 
.001256 
.001255 
-001253 
.001252 


401 


402 


Sg N? 


800 640000 
801 641601 
802 643204 
803 644809 
804 646416 


805 648025 
806 649636 
807 651249 
808 652864 


809 654481 
810 656100 
811 657721 


812 659344 
813 660969 
814 662596 


815 664225 
816 665856 


817 667489 
818 669124 
819 670761 
820 672400 
821 674041 
822 675684 
823 677329 
824 678976 
825 680625 
826 682276 
827 683929 
828 685584 
829 687241 
830 688900 
831 690561 


832 692224 
833 693889 


834 695556 
835 697225 
836 698896 
837 700569 
838 702244 
839 703921 
840 705600 
841 707281 
842 708964 
843 710649 
844 712336 


845 714025 
846 715716 
847 717409 
848 719104 
849 720801 


. -001112 


1 
VN N 
28.2843 -001250 
28.3019 -001248 
28.3196 -001247 
28.3373 .001245 
28.3549 -001244 
28.3725 -001242 
28.3901 -001241 
28.4077 «001239 
28.4253 -001238 
28.4429 -001236 
28.4605 -001235 
28.4781 -001233 
28.4956 -001232 
28.5132 -001230 
28.5307 -001229 
28.5482 -001227 
28.5657 -001225 
28.5832 -001224 
28.6007 -001222 
28.6182 -001221 
28.6356 -001220 
28.6531 -001218 
28.6705 -001217 
28.6880 -001215 
28.7054 -001214 
28.7228 -001212 
28.7402 -001211 
28.7576 .001209 
28.7750 -001208 
28.7924 .001206 
28.8097 .001205 
28.8271 -001203 
28.8444 -001202 
28.8617 -001200 
28.8791 -001199 
28.8964 -001198 
28.9137 -001196 
28.9310 -001195 
28.9482 -001193 
28.9655 «001192 
28.9828 -001190 
29.0000 -001189 
29.0172 -001188 
29.0345 -001186 
29.0517 -001185 
29.0689 -001183 
29.0861 -001182 
29.1033 -001181 
29.1204 -001179 
29.1376 -001178 


APPENDIX B 


TABLE I (continued) 


722500 
724201 
725904 
727609 
729316 


731025 
732736 
734449 
736164 
737881 


739600 
741321 
743044 
744769 
746496 


748225 
749956 
751689 
753424 
755161 


756900 
758641 
760384 
762129 


763876 


765625 
767376 
769129 
770884 
772641 


774400 
776161 
777924 
779689 
781456 


783225 
784996 
786769 
788544 
790321 


792100 
793881 
795664 
797449 
799236 


801025 
802816 
804609 
806404 


ale 


-001176 
“001175 
“001174 
001172 
+001171 


-001170 
-001168 
.001167 
-001166 
-001164 


-001163 
“001161 
-001160 
-001159 
“001157 


-001156 
-001155 
-001153 
-001152 
-001151 


001149 
“001148 
001147 
-001145 
-001144 


«001143 
-001142 
-001140 
-001139 
-001138 


001136 
-001135 
001134 
-001133 
“001131 


-001130 
-001129 
-001127 
-001126 
-001125 


-001124 
“001122 
“001121 
-001120 
-001119 


-001117 
-001116 
-001115 
-001114 


TABLES 403 


Taste 1 (continued) 


900} sio000 | 30.0000 | .001111 902500 | 30.8221 | .001053 
901 suen | 30.0167 | -001110 904401 | 30-8383 | .001052 
902 813604 30.0333 .001109 906304 30.8545 .001050 
903 815409 30.0500 .001107 908209 30.8707 .001049 
901 | sou | 30.0666 | .001106 910116 | 30.8869 | .001048 
f 905 819025 30.0832 „001105 912025 30.9031 -001047 
vos | soueng | 30.0898 | -001104 913936 | 30.9192 | .001046 
907 822649 30.1164 .001103 915819 30.9354 001045 
908 824404 30.1330 .001101 917764 30.9516 001044 
ooo | 826281 | 30.1496 | .001100 910681 | 30.9677 | .001043 
o10 | gesioo | 30.1662 | -001099 921600 | 30.9839 | .001042 
al 829921 | 30.1828 | -001098 923521 | 31.0000 | .001041 
o12 | 831744 | 30.1993 | -001098 925444 | 31.0161 | .001040 
913 833569 30.2159 -001095 927369 31.0322 .001038 
914| 835396 | 30.2324 | .001094 929208 | 31.0483 | .001037 
2 2490 | -001093 931225 | 31.0644 | .001036 
WE, e | 001092 933156 | 31.0805 | .001035 
916] 839056 | 30.2655 | - 
917 840889 30.2820 ¿001091 935089 31.0966 001034 
F 001089 937024 | 31.1127 .001033 
o1 | 842724 | 30.2089 | 938961 | 81.1288 | .001032 
o1o | saser | 30.3150 | -001088 19 .001032 
,001087 940900 | 31.1448 | .001031 
920] sasso | EL | onge 942841 | 31.1609 | .001030 
el sau | 30.3480 | - 6 
la ess | .001085 944784 | 31.1769 | .001020 
= SEN 30.3809 | .001083 946729 | 31.1929 | -001028 
pza | 8sarre | 30.9074 | -001082 948676 | 31.2090 | .001027 
001081 950625 | 31.2250 | .001026 
Eh Bane? soans .001050 952576 | 31.2410 | .001025 
ml 857478 | Ze | 001079 954520 | 31.2570 | .001024 
vas Sa Ian | -001078 956484 | 31.2730 | .001022 
s| sou . d E : 
28| sonei | 30.4705 | -001076 958441 | 31.2890 | .001021 
soso | -001075 ocosoo | 31.3050 | .001020 
930] 864900 | 30:5753 | “01074 962361 | 31-3209 | .001019 
931 | menu | 30-5557 | ween 964324 | 31.3369 | .001018 
932| 808624 | 30-527 | “001072 966289 | 31.3528 | .001017 
933 | 870489 | 30 sisë een o6s256 | 31.3688 | .001016 
934) 872356 . i 
ct) maas | ao | omme EE | a 
93 Gegen | 30.5061 | A0 E | 31.4166 | .001013 
987 Sieg | 20.0105 | Dot area | 31.4325 | 001012 
ce erren 6268 | -001066 976 : . 
938 | 879844 | 30-67 978121 | 31.4484 | .001011 
Ech SE | ege || 2002088 
: j 
oso100 | 31.4643 | -001010 
940 | 883600 50.607 ot 982081 | 31.4802 | .001009 
6 s 
pia | cass | moors | oie Ju moss | 200 | omo 
942| 88736 -6920 | est d : 
043 | men | 30-7005 | “001050 ossos6 | 31.5278 | .001008 
944| 891136 | 30.72 . 
090025 | 31.5436 | -001005 
oss| mer | 2972 001057 992016 | 31.5595 | .001004 
7. D 
9s | mmm ln | -001056 Gesi || sooroo 
947 | 06809 | Zen | -001055 996004 d ‘001 
veg) 808704 | 30-70 | 0010 gosoo1 | 31.6070 | .001001 
soen | 30-8 1000000 | 31.6228 | -001000 


404 APPENDIX B 


TABLE II. Ordinates and Areas of the Normal Curve 
(In terms of y units from mean) 


Z Area | Ordinate Area | Ordinate Z Aren | Ordinate 
00 -0000 -3989 +1915 +3521 1.00 | .3413 .2420 
.01 -0040 -3989 -1950 -3503 1.01 | .3438 -2396 
-02 -0080 -3989 +1985 -3485 1.02 | .3461 -2371 
-03 -0120 -3988 +2019 -3467 1.03 | .3485 -2347 
.04 -0160 -3986 .2054 .3448 1.04 | .3508 -2323 
.05 -0199 +3984 -2088 +3429 1.05] .3531 -2299 
-06 -0239 -3982 +2123 +3410 1.06 | .3554 .2275 
.07 -0279 3980 +2157 -3391 1.07 | .3577 .2251 
-08 -0319 -3977 +2190 -3372 1.08 | .3599 .2227 
-09 -0359 +3973 +2224 -3352 1.09 | .3621 .2203 
-10 -0398 -3970 +2257 +3332 1.10 | ".3643 | .2179 
-11 -0438 -3965 .2291 -3312 1.11 | .3685 .2155 
-12 -0478 -3961 -2324 -3292 1.12) .3686 +2131 
-13 -0517 -3956 2357 +3271 1.13 | .3708 -2107 
-14 -0557 -3951 .2389 +3251 1.14 | .3729 .2083 
15 -0596 -3945 -2422 +3230 1.15) .3749 .2059 
-16 -0636 -3939 2454 +3209 1.16 | .3770 -2036 
17 -0675 -3932 -2486 -3187 1.17 | .3790 .2012 
-18 -0714 -3925 -2517 -3166 1.18) .3810 -1989 
-19 -0753 -3918 -2549 -3144 1.19 | .3830 +1965 
-20 -0793 -3910 -2580 +3123 1.20] .3849 1942 
-21 -0832 +3902 2011 .3101 1.21 | .3869 +1919 
.22 -0871 -3894 .2642 -3079 1.22 | .3888 -1895 
-23 -0910 -3885 +2673 -3056 1.23 | .3907 -1872 
-24 -0948 -3876 +2703 +3034 1.24 | .3925 .1849 
-25 -0987 -3867 +2734 +3011 1.25 | 2044 .1826 
-26 -1026 .3857 .2764 .2989 1.26 | .3062 .1804 
-27 -1064 -3847 -2794 -2966 1.27 | .3980 +1781 
-28 -1103 -3836 +2823 +2943 1.28 | .3997 -1758 
-29 -1141 +3825 +2852 +2920 1.29] .4015 +1736 
-30 -1179 +3814 +2881 +2897 1.30 | .4032 -1714 
-31 -1217 -3802 +2910 +2874 1.31 | .4049 -1691 
-32 -1255 -3790 +2939 +2850 1.32 | .4066 .1669 
-33 -1293 -3778 +2967 +2827 1.33 | .4082 .1647 
-34 -1331 +3765 +2995 +2803 1.34 | .4099 -1626 
-35 -1368 -3752 -3023 -2780 | 1.35 | .4115 +1604 
-36 -1406 -3739 -3051 -2756 1.36 | .4131 -1582 
-37 -1443 +3725 -3078 +2732 1.37 | .4147 +1561 
-38 -1480 +3712 -3106 -2709 1.38 | .4162 -1539 
-39 | mz | .3697 -8133 | .2685 | 1.39| 4177 1518 
-40 -1554 +3683 +3159 +2661 | 1.40) .4192 1497 
41 -1591 -3668 -3186 -2637 1.41) .4207 1476 
42 .1628 +3653 +3212 +2613 1.42 | .4222 “1456 
-43 -1664 -3637 -3238 +2589 1.43 | .4236 “1435 
44 -1700 +3621 +3264 +2565 1.44 | .4251 1415 
A8 +1736 -3605 +3289 +2541 1.45 | .4265 1394 
BEI -1772 -3589 +3315 +2516 1.46 | .4279 “1874 
47 -1808 +3572 -3340 +2492 1.47 | .4292 1354 
.48 «1844 +3555 -3365 -2468 | 1.48 | .4306 “1334 
-49 -1879 +3538 -3389 2444 1.49) .4319 1315 
80/0 | 1916, EE 8618 | :2420 | 160.) 10832 | “1295 


ain 


| 


nanaaon aanoan 
DONGA =0N-Oo 


NRO DOIDA BANKS 


PRE aN SIA 22220 Proa 


PRO COMNeG ao 


OP pd ol pu do pd ll al ll ol ol fu pul pë pad pu pd pl pë $ë pad pad pad pu pul pu e o pu ul pl ul a pl pl pë pë pë pe pl te 
alo Jee Le: 5 rads A E NA Lei: EE 
EEGEEKEEEEERECER 


e 
e 


TABLES 405 


Taste II (continued) 


Arca | Ordinate z Ordinate Area | Ordinate 
.4332 | .1205 | 2.00 .0540 | 2. .4938 | .0175 
“4345 | .1276 | 2.01 .0520 | 2. .4940 | .om 
14357 | .1257 | 2.02 .0519 | 2. .4941 | .0167 
“agro | .1238 | 2.03 .0508 | 2. -4943 | oe 
“4382 | .1219 | 2.04 .0498 | 2. .4945 | .0158 
„4394 | .1200 | 2.05 .0488 | 2. .4946 | .0154 
"4406 | .1182 | 2.06 .0478 | 2. .4948 | .0151 
“4418 | .1163 | 2.07 .0468 | 2. 4949 | me 
“4429 | .1145 | 2.08 .0459 | 2. 4961 | .018 
“asar | .1127 | 2.09 ,0449 | 2. .4952 | .0139 
„4452 | .1109 | 2.10 0440 | 2. .4953 | .0136 
2.11 .0481 | 2. Aas | .0132 
snp aoge 2.12 .0422 | 2. .4950 | .0129 
Auk | a kee .0413 | 2. .4957 | .0126 
«4484 | -1057 ; oan | 2. ‘4959 | .0122 
“4495 | .1040 | 2.14 a: 
1023 | 2.15 ,0395 | 2. .4900 | .on 
GE i 2.16 .0387 | 2. .4961 | .0116 
«4515 | -1008 | Ze ‘0379 | 2:67 | .4962 29 
4525 | .0060 | Se on | 2. “4963 | .0110 
.4535 pai 210 10363 | 2. “4994 | .0107 
ER . 
bc E go | 2.20 .0355 | 2. .4905 | .0104 
.4654 | -09 SC ‘0347 | 2. .4906 | .0101 
4508 1.0028 | Za oam | 2. 4907 | .0099 
2809 | -080% |) a .0332 | 2. “4968 | .0098 
.4682 | -0893 | Sol .0325 | 2. “4969 | .0093 
ët, 1 Ehe .0317 | 2. .4970 | .0091 
um | .0868 | So ‘osio | 2.76 | .4971 | .0088 
4608 | mëi 220 oam | 2. “4972 | 0086 
4010 | -05%% | za loz07 | 2. “4073 | .0084 
.4025 .0818 SC .0290 2. 4974 .0081 
de Md .0283 | 2. .4974 | .0079 
„4641 | .0790 aH .0277 | 2. .4975 | .0077 
.4649 .0775 SCH .0270 2. .4976 .0075 
4656 | .0761 oe 0264 | 2. .4977 | .0073 
.4004 .0748 SH .0258 | 2. -4977 -0071 
EA) e .0252 | 2. .4978 | -0069 
sera | .0721 BE .0246 | 2. “4979 | .0067 
a686 | -0707 | 2-2 eu | 2. .4979 | .0065 
“4003 | .0604 | 2-37 .0235 | 2. .4980 | .0063 
“4g99 | .0681 aqë “0229 | 2. .4981 | .0061 
lazos | -0609 | 2 0222 | 2. .sos1 | .0060 
56 2.40 .0219 | 2. 4982 .0058 
ge ‘ond 2.41 0213 | 2. 4982 | .0056 
SCC “os32 | 2.42 one | 2. “4983 | .0055 
47 . : 
4732 | -0620 2.43 “9203 | 2. “4984 | .0053 
5 om | 2 2.05 | .4984 | am 
.4738 | > .0198 
os96 | 2-45 o194 | 2. .4985 | .0050 
ares RA 2.46 .o1s9 | 2. 4985 | .0048 
+4750 | 2.47 4 2 App | .0047 
4756 | .0573 A 10184 : | ee 
“aro | .0562 | 2% .0180 | 2. : R 
wee | 0551 | 3 (0175 | 8.00) .4987 | .0044 
4772 osso | 2:50 


“009” 


2 
UB) $59] Std 19AD9UIYA UTIs DAN TÍO v soto z TORNGHysIp jo Donen Jamo] woy uonaodoid st d 


OIST'I 00011 006 ` ont: S6EFT 840 6FFOT PISO oe oso 0096T ShSSO Gig ezo 
ELSZ'I Int 106° 660° 99FFT Ou 9PGIT GFIOL’ TSG 6F0° FLLET gon op rem 
Of6GT ball 206" 860" SEET OSE 226" Sg 9F99T  €8660° cop gro FS66T Gro ug eem 
886G°T mut 806 260° TI9FI TELET sep zg ZVLOT otsen opp eg IPTOS Greco 826 en 
ZVOET S80LT' Po6 960° PSII FLEET’ Ge6 120 GPS9T SPH960' rop og S60 IPOSO 616° em 
90TET  Z069T' opp ` opp: SSZFI eve 086" 020° FOOT Giro opp: AE  ZFSFO" 086° ozo 
GOTE”T Lin 906" #60" gesrI Ge 186 690 0902°T GOE6O 9S6 640% Gong IS6  610' 
SZZE'T Ge99T' 206° 860° 60GrT DQEIST Sep 890 G9TZT Ston 286 69605 Aer cap org: 
GSZE'T ment 806" 260” SS6r'T uer ep 290° DST 29680 gee" II ot 886 ZIO 
OPEET ELE9T’ 606 160 DOUT Oger Pec 990 GELT Z6L80 686 PHILS £00P0" PS6 910 
SOPE'T Gesot OIG' 060° TRIGT oi op ong 202L°T Anen 096" og TOLITS Join 986° cto 
69PE'T SOT9T’ TIG 680° OZZS'T Sasa 986 ron POLL  SrPSO' 196 6go ELOTG 69980" 9S6 FTO" 
eset OT alb  $80' TOES'T Sea” 2e6 290 PPLLT ` geen ` oe seo EITE green 286 STO" 
S6SET PESE SIG 280° G8EST Secci Sep 290 998ZT ZSOSO' £96" on TAS’ SZTEO" gen alo 
S59E'T S69ST' FIG’ 980° POST ZOOST 686 T90 T664°T 80620" +96" 980 067Z 96830" 686 TIO 
ceeT T9SST’ SIG 280" SPECT GIGIT opp 090” GIIS'T 23420" 296 seo DOE € SIIZO' 066 OTO” 
ZSLET EPST” 916 PSO GEIST SIT TPG’ oo OSZS'I Sh 996 rem 9998S TEPZO' 166° 600° 
POT oeoi Ap 80" SIZGI QO9IT Cp gon PSES'T oun 296 eem 680F'S Z6IZ0' z66 800 
LIGET OFTST’ 816° Cen SOSSI SHI 6" 290 SS'I git $96" zego SLSV'E prom op 200 
TOL LOOT’ GI6G' T180 t689'T FPSZIT FFE oo E99S'T 26690" 696 Ion IGIS OOZIO rop 900” 
ISOPT ZoOSPI' 026" 080" C862T Pal SPO geo SOSS'T Ion 046° 080 SSL OFFTO’ 966 ong 
SIIFT ae 126" 620° GLONT FOGOT Q6 FEO ZS08'T SI990 126 on Ion: oe 966 ron 
ASIF'I gert Ep 820° FOTOT EOSOT ZPO Eco" OIIGT Sgp90 up gen HÄ:  ST600 266 800” 
SGerT GhFFT’ 26" 240° SETIT (rout SFG aso SIE6'T eco e246" 120° ESLS'E Teo 866° Zoo 
SZEPT GOZE PZO 920 SOOT SZHOT Gr6 190 TEFG"T 0P090° ru 920” 3060'€ ZESO0' 666° on 
o 2 d d b 8 d d a z d d o z d d 
z D z z tij 


(d Jo sunog uy) 
DAN) (DUMON ay} Jo stu )-9 PUD SIPUIPIO TIT TIAV 


406 


2 
“009” UUT} ES9] et d 19AQUIYA UZIS OATQUFOU Y SOLLIVO E 


"Honnert JO maen 1940] woaz UOH IOd OI sid 
== 
91P8' 96622" 008" 002" FEC" BL eg or PIEOT org Oss osr OSTIT oom og oer 
S9P8' 31623" 108° 66r S8E6' P89 928" PLT" ZOPOT ZIZEZ 198" ot Goat om 928 rer 
8873 LESI me 86r | ver 06% Ae er ISPOT our 268 mi | mort reem ug eer 
p98 GPLLG og Aer E9P6' Oppe 838 SLT" POPO'T 008 og Jet OS9TT 820% 828° zer 
0958" 2G9LE 408" oer 096" TOPS" 678" TAU LESOT 86822" #98 opt OOATT Im 628° Iert 
9698" 1Ze6 908 or Choo" S089" 088 ot T8S0T ole eeg SHT OSLTT 40002 088" oer 
898" S8h26" og rer 1866” OI TES" Got S290T 98937 968° PET 00811 98867 188 ot: 
6998' 86826" LOS eer 1296" PISZ ZES SOT” 6990T 08837 Joe et 068TT 89L6T° gg ot: 
9028' (IS om a6" 1996" LIOS eg ot PILOT edhe 868 ser TOGTT Gr96T SS” LIT 
SINT PELT 608 TGI" TO26' lere Pës 991 8EL0T 99ez os It GG6TT mont 788° ot 
6LLS' Zellë 018 ot: IPL6' €@8FS Seg out DUU TL SS mg OFT Hl uo 988 or 
9188" DOS TIS' get: GSL6' 92LF@ OES" PIT SP80T pr 1989 Ger SGOST  O6Z6T 988 FIT 
£988 09693 ig SST 2786" Se9re Leg ot £680'T 1022 og SST LOIST G9T6T 288 €IT' 
0688' 1898 £18 281" £986" Gesre Bes SOT" 6E60T ZeGIZ og LET 09I8T sont 888 arr 
2368" 28295 PIS 98r F066" DIF Ge IO 6860T cas #98 oer IST 93681" 68 TIT 
S908' £6996" SIS or Drop: LH OFS" TEOT'T UI omg Ger SOTT FOSST 068" ot: 
6006" 80998 918 FST" 9866" ESHE IS” LLOTI T097 998 FET" GlecI 18981 eg opt: 
OPOG' 61695 As or Le001T cé rg SST ESITI  O6pTZ" TLETT Joor zës Sor 
8206" SOUS SIS gor 6900T (our reg LET OLITT GLETs OPET vert  £68 Lor 
9TI6" les95 og 18r OLIOT (pes FS 9ST LIGTI 29313" ISPET GOEST og ont: 
Pero” UU (eg osr GSIOT pes org Ger FOSTI Got 028 oer 9ESZ'T FIST’ og sor 
2616" SPI9 Is or POIOT LLET og Ter TIETI notz Us oer T6831 SSOST 968° tor 
0866" 92093 Zë” ST LEZOT Genee Lis eer DEI TL Sao ug ser HUT  ZEGLT 268" ent: 
6966" PIST EZF LT 6230'T ZE SFS zer LOFT Gan eg Aer EST  SOSAT 868" Zor 
2086" 868 Pas our se:01  G6TFE pg Ier SSPIT 00202 Pas oer 62451 S8L9LT° 668° TOT 
2 z d d o z d d a z d d D z d d 
z z z 


T 


(ponuyuoo) 11] wavy, 


407 


“009” Go sso] ST d 19ADUIYA USITS SANBUIOU Y SATIS E “WOHNGUASTp JO Ayrur019x9 J9MO] woaz uorodosd sı d 


e è 
004 00g" 8269 29868" Se oe SPLI' BLIES OSL" Oe POL 16663 SLL" Ger 
8882" 91663 924 Për 


prer GOLPE" À E 
ep Aug 10 we | 8009 Je mec se lag or te ee 
goss: mc o 863 | seos e 224 gie | sos ape oi oe Io oe Lë see 
mz (me om ec | 8909 Je Ser ae | 089 wore ei e lou oe Si e 
mes Joe FOL 966 | 8609 wem ei Lë | msy aere si ge | 889% gene oui Tee: 
sees” were om oe em or 08% os | soes zerre oi oe | sett eme 08% ozz 
LIS mm om wen me Tel o | sees aere 00% më | 00% eme 18% os 
mc mc LOL om | 6819: mër oi om | 296” Sesi o oe | 062% soe oi or 
mz Dee SOL me la oer tel Log | 6669 oeete Ser ae leet ae oe le 
goss’ ere or 166 | 0829 ser mi 998 | teo serte oer me | 808% oe 58% oie 
mr mere OIL 063 | 0820 mr mei mac | eo: gore 00% me | 268% oss ot or 
moi om D emm 16986 9% elo mue mm: oe | oze wie 08% $e 
26997 oc IZ mec | TPE9 weer Ier om | Se: oo oo sec | 196% 19062 28% Ste 
omg mme STA Je | Z489 oe mer 20% | 091: sesos 20% ee | 966% mees Se ete 
mg Lore" PIZ 98e | sors wer oe mee gosos so 98% | ogos  1068% 68% me 

eg 09686 o 98a" mm Joe me 09% ei mue sor see | vo08 02888 osez os 

wm mec om mc | FOF9 deer DG 60% | Lee ene 99 re | 660” eese 16: a: 

mg 98868" LTA mee | S69, Jeer o mc | mei  <egos 20% oe | FETs’ soose 26: Soe- 

6949" ue su ele meer epi Jee mae 89% oe | 6918 ue sero e 

wn ee o SS | 2999 ee mt 98% we ee 60% e | rozs See 56% ooe 

i 
e "wer e | 8890 (ie oe emer gogos! or 08% | 688 eme: oi: oe: 

SC Gard Te; we | 0209 me wi ele 16206” I wee | peer gees oer FOR" 

see oe ei ge | 1990 ae Ly ee mee o së” | 0128 zyese Ju me 

grea’ meer me Ae | 2800 mar Sp else o or Je | ee mae So e 

eg e va ME | E49 mee o Ir me ve we | Tees osose ber 0 

KEE e 
GER ies] 2 z d d 2 z d d D z d d 
z z T 


n ———— pom 


(ponurjuo2) JI] seet 


408 


“009° Gomm eat et d IPAOUOTJA UTIs IABIN Y SOLLIGO 2 *u Or Nqijsrp jo Ajruo1jxo IOMOT u01} Gon2odoud sı d 


LESS 
DOC 
egez 
TI9G' 
Jus 


99%" 
6892" 
DCH 
IST: 
Dr 


£616" 
DIS 
Bai 
Le 
8683" 


GU 
1806 
LOTS 
PEIE 
O9TE* 


o 
T 


Uni een oe ESSE” OPOZE” 
8886 MY sie 088€” TOOZE' 
208" Aen gie 206€" 29698" 
EZ8LE SY zue: PEGE £7698" 
06228” een ue 1968" P8898" 


L9LL8" ogg 
DÄI eg 698” 9TOP” #0898 
06928” ze9 898” EPO" 79298 
99948" geg’ 298" Wi EGLIE” 
TOLE ren one 2609” 28998" 


88248" on ggg DIE Iron" 
EGGLE” on soe: 
SIE Jop ggg DIE 89996" 
ËShLE” geg oe 203p" ITGHE” 
SHPLE” 689 TIK” PEN Fro 


CIPLE” OPO' oge 
HELI "mo oe DCH 68896" 
Oveze” rg gog DICK BA 
POL rg Jeer PECH Zoege" 
99748" PRO gge SÄCK 6GE9E” 


63348" org geg 66€p" HIE 
S6ILE ` og Per” Daag TLIO" 
PSILe Ing Ese” kisa Bav 
OTIZE" gro geg Gë Z809€" 
SL0LE” 679" TeS DIr" LEO9E” 


D d d 


oso" ose sesh” 
Tso" Gre' 9999” 
OU Sh" £62p 
ECH LE 1Z9p" 
Po" = OFS" Gror' 
999 She" ZL9Y' 
9S9 Fre EA 
ZY ere EELV' 
899° Së T9LY' 
699 Tre 6847" 
099° ope LISH 
Zeth 00998" 199° Geg” srsp" 
IV SE PLSP 
£99" LES z067" 
PO ges 0£6P 
T9zp" Ingo og Gee" DOpp" 
999" Ee" 286p" 
299° eee STOG' 
899° zes Pros" 
699 Ter” ZOS 
029 08€" TOTS" 
Lu Geer 631S" 
TY geg 8STS' 
EI Ue 28TS 
PL ge BIS 
2 d d 


66S" 
9P6GE” 
00688" 
Pesce" 
808¢8" 


TOZGE' 
PIZSe' 
L99SE' 
0ZISE” 
TLGSE” 


posse” 
SLy9e' 
Lose" 
SLESE” 


GzESE' 


64398" 
DESST 
OSTSE' 
Kat 
62088" 


87098" 
ZLOYE' 
Gore” 
Vic 
ac 


2 


409 


0000” FESE” me me | Jm orses: Gë Gr 
ezoo’ pesee TO” or | zong (am 99 Hr 
mu — FOSHM oe am | 2290" £0868: 238 ir 
omg geseg gos" Jor | omg 96268" Së ir 
ooro geseg ne mr | Sao mi 638 Ur 
eem teses oe or Im 1860 08% or 
wo 06868 908 POP | 820 FALOS TES" or 
omg ` eme 209° mr | opt mi e 899 
Toco’ 98868" og op | gës0 8066 ter Jo 
ozco’ #8868" we TGF | eat mie TEN 9997 
om oam oe or | 8280 Dm Ser or 
Aer wan ue oer | F060 "ig 98% mr 
Toco’ oam oo er | 6260 tete Ze tor 
sem ` gang eg Jg | F960; Le Seç zo 
een oe FIS er | 6260 o 6er T9 
wm 9ot6 og oer | Foor "E OPS mr 
Zén zote” oe mer | ogor "em D er 
eeng sages’ mg Se | agor, te es Gr 
Zog fese gg ee | osor: 20068 e Jr 
elo" nese o Ir 19966 Fo om 
- - oe osř | oett oe og oer 
Co? Her Tee on get mme og rr 
Gë wg: ei Sp Jet ¿ML dë e 
Zi See, Së JEE 0 SÉ Ki 
än eg ver a | Test më o Ter 
A EE EE z d d 


D 


“009” UEY} Sse] SI d JOAOUATA UBIS DATUIOU Y SOLLIBI S “uonNqLqetp Jo Ajrararjxa 1940] W01} von Iodoxd st d 


Zecl” 
G81" 
LOST" 
SET 
seer” 


ESET 
SOFT 
FEET 
6SpT" 
PSPT' 


OTST’ 
BEST 
O9ST' 
98ST" 
TIOT' 


ZE9T” 
Z997" 
ZS9T' 
ETT 
SEZT' 


FAT 
6821" 
STISI” 
OPST' 
998T" 


TSg6E" oss’ (EE T6ST" 
89568” ISS GER ZIGI” 
Oo0pp ES SF" Stol" 


Erop ESTO Jr 8961” 
SEO Pas (OFF £661" 


top 999 GER 6103" 
Tos6s" 988 PPP SPOS' 
gsr6s’ LST GER 0203" 
Sr 89” Gr 9607" 
2SP68" 699” DE at 


zyres” 098 On Vata 
Zarës” 199 68r ELIZ 
DEE EE 8612" 
96£68" egg" Leh" FSCS" 
O8SE6£”  P9S' d 


posce’ ggg’ SEP GLEZ 
Irene 999 PER TOES" 
DEE Joe S&F" Gaza 
Sieg 899 Ger EES 
96368" 699 Ter SLES" 


62368" OLS (EE FOR 
T9Z6E Up GER DEZ 
SES ZLS Sp 9SrE' 
Depp ES” JEE ZSP 
90268" FLT IF 80SE' 


D d d 


LSI6E” GLT Së 
89168" US For 
GrI6S” ZLS EEN 
6zI68” 848" SEH 
OIT68” Gë IER 
68068" 089° (ER 
69066 189° GI 
DOG Z897 SIP" 
Sz068' £87 ¿IP 
20068" #877 9 
gsess’ S87 SIP 
PJESE” 988 PIP" 
orese’ 28" EIS 
03688" 886 ZIP 
0937" 26888 688 TIF 
. 
G/888" 068 OI 
Z9888' 16S 60 
Dsg Zë” 80F" 
SO88E' £68 LOF 
ZSLSE” Top 90P' 
89288" opp oO 
PELSE” 968 FOR 
60288" 267 oF 
VSISE” 867 Sot 
09988" 669° TOP 
z d d 


v 
EE e eee 
(ponurjuoo) TIJ "est 


410 


411 


8565 Sep 8L0" 9F9F SZET LEO" Eso DEER LIS 6 830 6968" 0830” 266 €00 
Lut on ën LU EE9V 2686 SFG" SOU SIS” GET  &L6 LEO SEI Ion 866" ` 200" 
Int alo ves 920 619F 897 Dro TEO gorr II Hlp 930 L967 erro” 666 100 


(ponurjuo9) AT wavy, 


412 


OF09" Eet DL oog Dn PRO” GL GF 0066' 98EG DÄ 095 PISG” 8S19 SLL 
2609" 0626 TOL" 668° Sp LIST 9 HLS 1688" LLES ISL’ GES TISS SFIS 924 
$609" PRLS GOL 863° 0265" 6988" JL ELE F689 69E ge BFS" 2085" GETS Zi” 
£609" LELS 0 26g L96$” 298” Sei ar 1689" 098 eta Li £089" GIG BLL" 
009 TELS To 967 PO08 TG” Ger 120 1888 TET ve 9 6625" GITG 624 
8309 vat GOL seg Jo Qe Og FBSS' ` rege So g 9029” 6015 084” 
S09 BIG o por DO Te 697 1886 BEES 9 WE 2619" 6608 182" 
£209" TIZS 2047 tor TES" SEL 897 2189"  GZEG Ze YT 8825” 680S SL 
0209” POL” So zo pes” ter 190 HSG SIG SH ae PRÄIS 6209”  €82' 
HI 8696 604" 163" DIS TEL 997 TZ8S 2089 GGL" Do 0829" 6909  TFSH 
STO9' 1696” OIL 067 805” GE 997 2988 8678 094 =: OFS 92LS” 8909 282 
£109" #899" TIL’ 687 0066 DÉI BB" PO 6838 ToL" DEZ CLL" SPO$' 984" 
UI ZOG LU 88" 36" Le goe T98S" 0878 Z9 BEG 8929" SEU 282 
2009 Oo CL 48g Pio SE” IT 4989" A 894 LE POLS Le0$' 882 
Ou 8999” PIL 987 Dr Ger 197" POR ZT DÄ OES 0929 LTOS GL 
Z009" 9809" SI Oger 89e DÄ 095 DG  Eses S92” DES, 92249” 2008" OGL" 
0009” DOG OU +83 O9FS” DÄ 699 LE8S° Pyes 994 VET FUS” 9667" T62 
L660 GOS LIM ese CSS GE 8ez' E LOL’ EES, SPLG” S86P ZL’ 
POD GEIT SLL asë PRO ` RL LSS Or8S'  Gaze tt) Geg VEL” SLOP 862 
666g" 8398 oU 187" QEPE” RL geg DER  9TZS 692° TËS 0p2S" Popp POL 
686g" 1296 034 OS 9169" Sat GPE 997 ER 9026 DL DES 98L¢° EG GOL 
9869" PIOS Te 623 ET6S GIP DEL Po 6388" Jore TLL 633" TELT chor 964 
ESOG” 909” së 8T OT6S” Lvl EST Sp LEIS SL 82" SELS' ZEGT L6L° 
1869" 666” teë LE 2065" SEL geg TESS SAIS SLL LOG PELS 126” 86L 
8469" ZOST FEL OLB FOGS PGES ORL Tee" SIS sors FLL 93% 61ZS" OT6Y” 664 
2 bd d d z hd d d z bd d d 2 bd d 
bd 2 bd E bd EI bd së 


(ponuruoo) Aj atavy, 


413 


6109" Doten 009 or 
TZO 99 Im 66€' 
Dou Ein  Z09 SE 
G0Z9" 0TIZ 809 26S 
SOSH” Joen p09 96” 


9029" F0c9° 909° Go 
2039" TOZ9' 909 Po 
POZO"  86T9 209 €6€' 
$039 POT 809° 36€ 
3039"  T6I9 609 166 


0039 8819" 019 066 
6619” SSI TIO 68€" 
8619” ISI ai gr 
24619" 82419" I9 28€" 
9619” SIS pro 988" 


FOTO" TATI. SI9 988" 
e619" Soto 919 Pse 
Epi PO LIY EBE 
0619” = TOTO" SI BBE 
mi Jon o  T8E 


Sai vom 039 OSE 
9819” OSI TO GE 
esto’  9PI9 Sen BLES 
PSI St SY LL 
esto’ GET Per ME 


Noe 
LL EE EE 
(ponuguoo) AT atavy, 


1819 gero: Sen oe | zm Te09" oe 
6419" em oer sue | omg 9209" Lo 8609" 688” 919° reg 
SLI" jem Jo gue" | eg em ay 1609  £689 up ee 
Am "em oo ie | 2819 Am sep 6S09 888 ag ee 
SLI em Gen ue log em ro 2809 ` eege oun Leg 
FAIS  9TI9 089 Ole | rem zoog gegt oe | eng uge: mg oze 
e D D D cë 
GAIS GIO Teo" 698 sem om 960 me | sesos T287 ron ere 
Li DO ou ee | ogro ae er grë am 9988 een org 
6919 POT £69 Joe "sm geer Se” ee | au mee en ue 
8919 0019 "mm 998 "nm om oer Ipg | 9209 pes” ren oe 
99T9" 9609 "er me | palo eme 099 melu oe on org 
S919" 3609 9E9" me lem seg 19 oe lag og 989 ste 
DI e 289 me TGI" se zə see um zes 289 erg 
coro" £809 889 208 | og 896 og Jee | 2909” Tese 889 ere 
mm 6409 men T98" | ZITO’ oe o gee | 690” gas een re 
SSIY 4409 mé 098 | em og go” ep | oam GIs osy o 

> D D Te 
delo” Di TY oe lg ese” oo ee | 1909 erg 169 608 
DEI 9909 SY 888" | TITO Je 299° eer | 6209 zose 269 808 
II 2909 SPY Jee | og ges ag Zee | og Toss’ gey 208 
ESIY 8909 mg 988 | mg eo 699 Teg | $09 roi 69 90% 
mn "og "9 Ses | omg eem og oeg | og ` gei ceg 

D D DH D $08 
mit G09" 90 FA | gors gem up ep | 6p09% oi oe +08 
Ji mm Jm SSB" | TOTO’ ee my Sër Jg ou 169 Soe 
mi mm SP9 SSC | 6609 oos og Ier | om 60% 860 oe 
mi em mi TSE | 2609 oe ug 0% | om ee 669 Tog 

A bd d d 2 bd d d z bd d d 
bd r bd E bd E 


414 


EE 
4 GIE: j E ` OSF 9629" PS US geet: 


Joen 9989" 009 oe E9%9" een Ga ou 6929" geo" 099 

Joen ` op we Gr Dien 909 oe py Son sem IS oer | EZ een wg re 
Joen 9989" ong 867 £979" geen Lë” Si Zon een ee gr | PEZ wen Ap ser 
Joen 9989" eg Zor CITY geen see ap 1ISZ9 Gay ge” Ir | PER Aen BLS ar 
9929" 9969 PO” ` opp: Soen reen ee Up Ion seng PS or Geen og og Tr 
99%9 999 oe cop | gøy sen oe” oz | eg zeen oe oe | sem sem os” 0% 
9979 9989" 90 tër T9EY reen Tee 69° | 0OGeY 0389 oo ppp | Te” oo 18” 6lp 
9929" 999 LO”  £6p 1939" een ze” BOR" | Grey Ger Joe ep | 0820 geg as SH 
9929 GJEN So ` opt T929" en get 29% SHEY Aen 828 aH | 6339 en ee ug 
9979 GJEN 609 IGP" 0929 Grey ver 99 Son SIE Ge TP | Se men Fe” os: 
9929" gogo" ors’ 06 0939 gon oe GOR In Ten oe OK | Ze men ege erg: 
9939 F99 TIF mer men Jm me mr | ozo sep 199 eer | gay aen 98% tre 
9929" #989" erte 887 6239" on LES "mp "men gen 299 ger | szzy say 28” erg: 
9920" FYEN ere ët: mon  GPE9 SE” Z9 ron suen ggg" Je | SZZ% zeen ee ar 
999 KEQEN re or | Sez prer oe 19 Pen on FOG" os | Gay men 68% Dr 
9929”  £9E9 ote ggr Soen she” me 097 Ion on oe Ger Iren ën 068 ott 
Don GJEN og BSF SH) aen IM oer EPZO" mn oe rr | seen SHY eg ` op 
9070" Z989" ZIT gr | Lea men zë 8Cp eco’  TOE9 298% er | Ig men ze” gg: 
$929" Z989 STS" op: 2929 een ee Je Zon open 89% er | 0339 en og me 
S99" T989 og ISP 9039 SEN e 9Gp Ion 2609 6996 Ter | GIZ9 open rës 907 
#909" men oze Osh | on Aren ere D oreo’ een oe mr | 2IZg reg gë” gor 
en men Të Git: SGa9 ` een oe rer GEZ pen IS Ger | on Ig 968 For 
VOZ" Geen zeg SY Goen PEE Ire soe 6829" 0629 ag ger | sig” SY Lë sor 
P99 een EZT Ar Pg pop SH” zer SECH sen ie Jr | FI geg së” zg: 
Een ` goen re oz POZO een ere TSP 2829-9829" re oer leen Seen 669% 107 

2 bd d d 2 bd d d 2 bd d d z bd d d 
bd E Dd E bd Z bd # 


(ponurjuoo) A] TIEV L 


415 


416 


APPENDIX B 
TABLE V. Table of Random Numbers 


res IO, 


00-04 05-09 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 
00 83474 29226 28313 50047 48885 15141 09967 41583 16311 37427 
01 03923 12282 93819 58928 25273 21305 34912 25859 36556 32280 
02 14513 51745 25987 46116 32723 14553 01890 75123 77090 86182 
03 89813 75362 13801 71825 45502 43603 00528 03315 80797 51954 
04 32167 08652 49524 24791 84877 71892 44795 32077 76302 40872 
05 41419 10395 90389 94960 98682 26763 41593 75984 36920 10095 
06 92598 05485 47358 39840 33510 40603 50204 80801 21792 01742 
07 44370 61741 80259 65432 47900 29031 91048 40456 62170 19789 
08 60770 74345 45182 15639 53398 85816 76665 24022 50982 52449 
09 18885 55615 59863 07591 03824 11293 91288 22314 33136 49537 
10 95977 99943 27874 65452 24880 52721 11748 97489 25505 95311 
11 13948 24893 90727 72819 73147 73969 69684 45497 43388 53054 
12 39828 87021 32726 45085 53523 75128 24268 12765 66799 41394 
13 32736 29722 43545 60914 53862 41737 86544 40180 33924 27858 
14 22844 77742 71572 07617 05136 09287 66488 47731 64881 53030 
15 69694 11080 15759 09183 85138 81561 42286 83489 62109 30034 
16 34176 42271 70176 16320 07336 39747 02510 96462 52222 96490 
17 97147 69734 68047 37417 69690 70957 64654 54441 15633 24937 
18 51396 58500 26926 31821 01710 33137 09045 62171 85939 22096 
19 61072 38873 20627 53366 57474 98386 56765 48994 87359 00194 
20 74153 72232 59491 53355 94270 06993 25306 80985 94216 19045 
21 43826 91447 33560 13859 87473 70388 90742 21200 26763 95272 
22 06081 03049 95964 54648 65039 68453 93891 68985 19932 73134 
23 86584 09836 82703 69606 68055 70436 72572 08402 53232 75154 
24 73158 59948 82129 22767 85068 61835 79218 64601 59854 43637 
25 65929 54411 47087 27745 35924 53146 90280 73174 51987 2158, 
26 95847 62612 19969 52243 81078 51215 17581 42364 16496 12808 
27 04676 69361 00065 64381 04068 20584 13768 84957 37497 73988 
28 03462 12273 85723 80945 14509 60281 24731 37852 34272 85003 
29 84994 42334 72269 98383 71589 10078 40046 96418 25271 28251 
30 46058 46944 82021 95088 28425 53576 31766 9813 
31 79369 42575 44260 23557 28077 38159 12850 36287 52301 SH 
32 47619 63272 16060 49389 02296 27358 10203 78476 13620 06070 
33 91936 46600 40241 82240 27313 17378 50813 42093 04975 33514 
34 31243 70172 85247 02430 21072 41513 76442 14620 90890 73587 
35 96046 48680 10662 69950 50120 10001 22292 
36 97992 81395 19071 44288 17955 31596 57292 83102 Sc Sc 
37 59160 85187 66887 49709 30070 02666 76745 94570 58940 52746 
38 85919 38263 70617 39025 14090 24346 29285 64554 94188 59964 
39 10529 17903 03444 34875 17918 60255 25574 05170 72397 21948 
40 56367 93527 21720 34389 76432 80791 80439 032; 
41 24715 21324 49067 35552 48193 10830 22090 es R 28004 
42 58543 87341 22903 24735 60537 86466 69156 55159 96945 37613 
43 01975 45181 46760 30200 93843 16941 72660 36059 34037 56202 
44 Ke WC 82143 61042 30948 00738 77290 58414 01386 69725 
45 919 2 43514 33696 10583 60683 5314 
46 68365 07355 38251 13864 66333 35803 51001 tear PRE 10087 
47 70233 78620 16528 33752 91647 18647 53531 47482 75 
5575 27458 
48 10396 81046 04540 37692 87303 07304 01051 92710 90874 92322 
49 33035 86608 45741 91437 05579 43612 58883 


09915 51373 


73264 


00-04 


01488 
91039 
97917 
46337 
21712 


93495 
45910 
37452 
10180 
66621 


92209 
64960 
62185 
23616 
56405 


97073 
77881 
93595 
72688 
86267 


63726 
51510 
91916 
16239 
84269 


68754 
68682 
08747 
28767 
22574 


51944 
95168 
90462 
55663 
88992 


40237 
91478 
58634 
42048 
38808 


90300 
31955 
83547 
54598 
67416 


30358 
68728 
86059 
35333 
51182 


05-09 


54105 
86693 
00688 
38963 
73069 


69441 
55895 
95988 
14870 
93383 


17520 
50275 
23812 
31210 
78249 


63638 
03122 
21518 
81587 
19961 


61928 
62248 
56097 
50250 
07772 


53054 
15728 
73352 
32388 
22824 


11628 
88580 
68494 
27461 
77369 


55260 
95909 
27116 
21825 
70447 


69784 
76349 
74560 
55314 
99546 


17814 
45462 
69989 
92240 
26746 


TABLE V (continued) 


10-14 


56710 
33117 
63123 
37540 
51596 
87159 
76010 
27514 
15714 
09295 


18238 
48576 
62209 
35107 
42902 


71187 
40584 
12670 
82231 
92707 


34705 
06735 
48109 
90999 
03782 


24939 
25614 
20626 
48340 
66913 


15929 
05828 
77184 
18661 
86242 


48434 
92840 
42176 
93498 
24961 


43135 
25773 
15719 
81737 
11589 


01793 
30736 
83905 
66626 
30987 


15-19 


34456 
82587 
23412 
26441 
90411 


01861 
83530 
57345 
66810 
90661 


86813 
30170 
19058 
41977 
73375 


01733 
89317 
51415 
07466 
84216 


16687 
85675 
65134 
69925 
60103 


83333 
94378 
24275 
41654 
61320 


15978 
30531 
43629 
07773 
10464 


48041 
32882 
20796 
93566 
49999 


78042 
13016 
01890 
27417 
33752 


46435 
79921 
47326 
24944 
04950 


TABLES 


20-24 


417 


25-29 30-34 35-39 40-44 45-49 


13726 
58815 
10727 
00986 
89252 


91397 
90488 
91633 
97797 
43762 


36706 
42129 
13152 
99155 
97809 


88613 
82373 
51616 
94468 
38748 


13368 
76714 
37502 
61950 
33090 


19746 
68926 
99654 
32065 
66525 


10620 
05791 
51217 
38085 
73665 


57395 
10543 
59839 
42921 
11117 


72657 
66947 
56111 
10126 
80319 


64194 
51994 
30108 
77718 
76717 


25603 
43678 
00599 
99585 
17643 


01619 
56997 
26215 
78935 
13701 


03670 
84032 
87614 
20977 
29526 


61705 
47303 
05550 
90411 
02725 


45477 
42145 
99209 
89040 
45437 


44062 
97570 
33617 
02363 
74065 


60855 
24029 
71906 
17771 
79231 


01267 
08640 
76922 
32386 
58401 


88914 
20485 
32382 
66240 
27272 


39528 
01248 
46677 
49460 
57682 


46658 
84603 
22016 
87117 
41832 


87919 
05612 
44834 
17329 
94963 


07977 
93102 
61544 
56114 
70411 


90113 
03014 
45420 
42513 
64667 


47099 
28619 
99233 
01931 
63968 


53519 
03462 
28069 
45002 
41888 


85544 
93317 
84456 
45617 
58348 


65965 
92826 
12319 
27343 
64397 


79443 
91039 
28804 
19827 
14222 


11732 
45812 
42717 
73418 
14351 


16691 
51777 
51486 
16266 
40540 


56387 
64909 
51327 
75386 
64078 


28830 
73613 
80783 
31960 
77701 


69624 
90769 
29035 
20657 
55127 


95468 
61096 
32175 
92108 
53277 


69357 
80319 
77116 
71942 
90918 


95560 
51687 
21743 
68540 
64557 


94733 
60684 
11974 
31463 
41354 


97598 
27591 
44216 
91437 
94833 


13324 
10917 
89917 
59600 
54440 


92055 
S3494 
54016 
58724 
59248 


01185 
27063 
29155 
95475 
56390 


26262 
90416 
90455 
17527 
05042 


67972 
35749 
62018 
51871 
55779 


73601 
23385 
42481 
98238 
02940 


01084 
61495 
64381 


418 APPENDIX B 


TABLE VI. Table of t 


SS 
Level of Significance 
at. = 


1 | 158] 325] .510) .727) 1.000) 1.376] 1.963) 3.078] 6.314 
2 ).142).289) ,445/..617| 
65) .978) 1.250) 1.638) 2.353 
-741| .941) 1.190) 1.533) 2.132 
727) .920) 1.156) 1.476) 2.015, 


-718| .906) 1.134] 1.440) 1.043) 
211) .896) 1.119] 1.415) 1.895) 
-706| .889) 1.108) 1.397) 1.860) 
-703| .883| 1.100) 1.383] 1.833 
.700| .879) 1.093) 1.372) 1.812) 


SOON naw 
8 
D 
g 
8 
Y 
Š 


697) .876| 1.088) 1.363) 1.796) 
-695| .873) 1.083) 1.350) 1.782) 
694) .870) 1.079) 1.350) 1.771 
.692| .898) 1.076) 1.345) 1.761 
691) .856| 1.074) 1.341] 1.753) 


pë 
E 
8 
D 
S 
3 
te 
8 
S 
pal 
8 


690) .865) 1.071) 1.337] 1.746 
089) .S63) 1.089) 1.333) 1.740) 
-688| .862) 1.067) 1.330) 1.734 
-683| .861| 1.066) 1.328] 1.729) 
687) .860| 1.064) 1.325} 1:725] 


21 ).127).257).391) 532) .686) .s5o| 1.063) 1.323) 1.721 

«686| .853| 1.061] 1.321] 1.717 
-685| .858| 1.060] 1.319] 1.714 
+685] .857) 1.059) 1.318] 1.711 
«634| .856| 1.058) 1.316] 1.708 


-634| .856| 1.058) 1.315) 1.706] 
-694| .855| 1.057) 1.314] 1.703 
“6831 .855| 1.050) 1.313) 1.701 
1653) .854| 1.055) 1.311] 1.690 
+683] .854) 1.055) 1.310] 1.697 


s885 Senn np 
E 
8 
3 


KSE 


att BIE Er LUCE or es THE tes er LI'E BLË gp 96'2 ot ot 
123 66° wë OS BOS ZEZ vez BES zrg OS 192 SSS 098 £97 LOS 
H's ger THE 9'E eg get I9'E oe org oct 86'E GOH att WL oer 
OEZ Tes S'S Seg deg OWS SHS OFS OFS HVS 097 më ONS BLS OLS 
09'£ COE 99'S EHS OBE 98'e HE'S ZOR ant It CT gr art ter 
Ors Gë HS GHZ Dë OS'S EZ L97 19° 99% 04% HLS oz BS OBS 
16'£ £6'£ 96'£ Iert ot rt At os Sek It 75"? 09D IL'b 8L'? gr 
WS 99'S 99'S OFS 19'S 59% 19S OLS FLT LLG BS 98% 10% WS 208 
It en get Ib"? SH'Y et get yo'h EL 08") Cen 00'S TI'S SIS os 
Lë ae EL 947 Lia 08'S 28'S 987Z OSZ oos BOS og LOE og e's 
00'S 90'S TI'S 07'S gg 9E'9 gg 95'S L9S VES 28"9 

002 €0° GOS 80'e Sle Gl OSE ESE 872 lee 908 

gg S8'S og geg L0°9 SI'9 LZ°9 SE'9 Lb"9 tg 79°9 

Ge Zee vee Bee IPE HE GFE SIE LUE 092 og 

ZO"L 60°L PIL ETL TEL 6£'L ZS'L ONL TLL 6L'L (ët 

SL gë AE 18'S We LBS zeg 96'E ot ot 90P 

LI'6 te 6T'6 80'6 LY'6 SS'6 89°6 £L'6 68°6 96°6 SOOT 

ayy pt OFF OFF Gët Ort OOF FOF gär Or Hr 

TI”ET GOST PL'EL ger Eet TO'PI SI'DI ECHT Lë SP"DI FSET 

89°9 OLS Lë Mä Lg 089 mg 189 169 og 969 

€T°9T LT'97 SEIT IOC OG*9T 09°9T 69°9T £8"97 TG'IT GO"LT ELE STILE 

KOS 69°8 238 898 og 298 198 998 698 IL'8 PL'8 9L'8 848 

05°66 05°66 6h'66 6p"66 6Y'66 85°66 EP"66 Ee 9P'66 SP'66 Yb"66 £P"66 TP"66 1F"66 0P'66 
09°GI 09°GL rot 6y'6l SF°GL réi oi OF'6L SH'OL no Eet ZP'GI (Sot OF GI 68:61 
99¢'9 TOKIO zse"9 teg EZE'I TOL'Y 987'9 BST'D FET'I BEZ'D GOLD TFI'9 901'9 ZBO'9 950'9 
og voz vaz bez ez gja Isë ocg Gjë Sh Shë Syve më thë Së 


Dt ZI umo or 
(arenbs usou 1931913 10)) Mopooij jo s9913əp 


x ge 


OO ew 


os 
vs SN 


22 22 


8°66 
8e'ol 


zz0'9 
143 


6 


oe'h Ft rot 98"? os HLS 0L'9 LOG 
LLZ 987 26'S ZOE SI'S I'e OBE OR 
Det got ot 90'9 (rg eg £6"9 £2'6 
S8'ë Sot OO'E IS 0302 o BB'S SL? 
SE Bot Ing eg LY'S TL'9 07'L $96 
$6°% 10'S GOS og HE'S geg 86'E HP 
90°S IZ°S GE'S PG 66'S SS"9 get Soot 
4oe VUE Be ees SHE LU ot WF 
Ze 79'S opge 90°9 ZY'9 66°9 ug gerot 
£08 678 LE£ BYE ENE 98'E or SI'S 
£0°9 61'9 Leg £9°9 TYL eet S98 97'11 
yee 092 got 692 Sg LOD arr GES 
gg 00'L 61'L ot SS'L eg ee St 
ELE 6L'g Jet LOS SUE Ser HE 699 
01:8 97'8 rg SLB SI'6 ge Cent tt 
Sly Gr 82°F Gr Set or VIG 668 
LOT SH'OI L9°OT LOOT GETI 9O°CT L7"£1 97°91 
ët 88°F S6'p 909 Gë IV 649 199 


ost S6°FT Let reet 86°SE 6991 0081 OI 
FO'I 60'9 919 92°9 GER 69°9 +69 ILL 
6b"LT £9'LT 16"LT VT'8T 1L"8T 9b"67 TB”OE TI VË 
#88 838 F68 106 Z1'6 826 996 EL OL 


92°66 PE'6S EE op 0£*66 ST'66 LI'66 00:66 6F°86 
2£'61 9261 ££'6l OL'GI SZ'GI got 00'61 (Ost 


186'S 876'S 658'S V9L'S STI'G LOP'S 666'P ZS0' Y 
ees Lee vez oez sez 9lë 00% 191 


8 4 9 9 H € 3 E 


g fo sono, qua) lag T pur $ L TIA TIL 


Hond pur Joqyng jo uorsstuntod Aq "mot ‘soury ‘ssarg 93009 978I BAoy ‘spoy7a yy 1094817019 : 1099P9U$ VO] paonpoidol st TTA AQEL 


419 


Taste VII (continued) 


degrees of freedom (for greater mean square) 


kd 
3 


ex 
ais 
gs 
av 
8 


90 


2.49 2.45 2.42 2.37 2.33 2.28 2.24 
3.69 3.61 3.55 3.45 3.37 3.25 3.18 


2.81 2.70 2.62 2.55 


4.45 3.59 3.20 2.96 
8.40 6.11 5.18 4.67 


4.34 4.10 3.93 3.79 


08 1.05 1.93 1.92 
2 2.62 2.59 2157 
1 
2 


AN a 


4.25 4.01 3.95 3,71 
2. 
A 


3.16 2.93 2.77 2.66 2.58 2.51 


5.09 4.58 


3. 
6 
4.38 3.52 


wm 
z 
15 
E) 
E) 
E) 


420 


ma 


2.25 2.20 2.1 
3.17 3.07 2.9 


2.28 


3.40 3.31 3,24 


2.49 2.42 2.37 2.32 


81 3.65 3.51 


Qe 


= se 
"N oa 
ko 
sa CS 
o mo 
oe on 
ës RY 
mo EN 
It 
23 85 
"N no 
E) = 
a Re 
PA mo 
— 
aR 88 
FA e 
=m 
SS ag 
naa me 
o 2 
85 83 3% 83 88 
MA ro EN AA AN 
ao o D 
E SS 83 83 88 
AN NN A O ma 
st ci oo 
2 SË ƏS 88 88 
SN NA AN AN HA 
or on ID oO 
të së 88 Sa Se 
AN on AN AN NA 
ON wh a m 
RS Sa Ba 22 28 
din SN on dd cin 
N on em a 
i RS KË 23 22 
SO AM di AN de 
on rH qa 
AM AM dy Ng de 
oy me Li 
ON AN as as NS 


AA AN Ndë Në Në 


$ë SË 38 88 28 
NO NË Ng dd ne 
23 84 EN ES xs 
AK AT it cit oi 
S3 88 58 82 8 
IS ode 
SË 88 93 85 58 
MD N a S a 
SË SS 88 IR ag 
TN ët E ri Sint 


75 100 


50 


1.78 1.75 1.72 1.69 1.67 1.65 
2.30 2.22 2.18 2.13 2,09 2.06 


1 
5 


40 
8 
A 


87 1. 
44 2. 


30 


24 


20 


14 16 


12 
“98 2.93 2.83 2.74 


1 


10 


degrees of freedom (for greater mean square) 


Taste VII (continued) 


26 3.14 3.06 2 


79 3.56 3.39 3. 


2.57 2.46 2.37 2.30 2.25 2.20 2.16 2.13 2.08 2.03 1.9 


3.35 2.98 2.73 2 


5.49 4.60 4.11 


2 
63 


eo 


3.32 2.92 2.6: 
5.39 4.51 4.0. 


17 


vn 


2 


.94 2.86 2.78 2.72 2.62 2.54 2.43 2.35 


2.14 2.09 


3.32 3.15 3.02 2.91 


8: 


.15 2.10 2.06 2.03 1.08 1.93 1.87 1 


an 


1.57 1.54 1.53 
1.90 1.86 1,84 


1.76 1.71 1.67 1.63 1.60 
2.22 2.14 2.08 2.00 1.97 


1. 
2.32 


2.82 


46 2.35 2.26 2.19 


54 1.51 1.48 1.46 
1.80 1.76 1.72 


86 


mi 


ón 


or GET SUI eet It et eent 69T TI II et Ir BIT PEZ Er IS IS'T nr 08'Z Tove er 8L'E ot 59:9 

0001 ITI ZUE FET seet SE'I OFT SHË ët 29° FOE 69T SZI GLE EST sei FET 10S ong (ee Jee 09'S 66'S FEE | 00 

mr GEE SET BEE ERT PST TOT II WII WE 60'Z OTZ at pe'z ert ger at ZT FOE FETE 08'E 79" 99:9 

di SEL GEE Get 081 Get IFT DI SOT 89I SOE OL'T at ei FST GST ei 20'S OLS ze BES 19% 00'8 38'E | 0001 

ett ot ZET THT DI LST POE FLT WII HE ZEZ LTT 6U'Z LE at gert 697 SBT 90E 9'E EGJE ot 029 

SEIE OM Séi Set GET Sel ëtt oi 39I OPI LO ZLÉ Sit IST SS'I OCT 901 £0% ZIZ së GES GOS 20'8 98'E | OOP 

BUE eet GET ROT £S'T ZOT 69T GLE SBT L6T 60'Z LIZ Sr PEZ It mer wert oer IIE Ia Hr 91:9 

SL ger 071 Gel SET ärt ët ët LST BOL ott OST EST Iert Zo'r BO S0'% FI'S 908 IFZ S9'% FOE 68'E | 003 

eer LET SHË IS'T met at CLT EBT II 00'Z ZIZ gr or Er BRE er ot ar 26.7 VIE BRE IGE SËRË 18:9 

EL Që 001 PET LOT ni LPI PSI int UI OLT at Ser GST FOL 00 20°% as Les EFS ¿9% 90'S 168 | OSE 
et OFT OBE SOT 6ST LOT SHË SST HOT LOT SHT SET Er OBT It aer SYT at SST LUE LP'S 38:9 

KET Lë IGE OCT 6ST SPI GPI SST OPT SPI STI LUT 281 98I OFT SOE 10'S 80% LUE 03% HS 12 | cer 

SHT OFT IS'T 6S'T BOT ET GLE 68'T BET 90'Z ett er oer et Ier eer 697 ZOT ot OLE Ier 4610 

SI oer Fer OGL GrT SFL ISI At ot SOT SLI AT SS'I SST BOT LOT Gë Orë GI's 08% 93 soe (001 
> et et et OL'T SC FRE POT SOT IME PUT TËT Ir arr eer tor Hr L8'Z POE STE ges 969 

SE Sk Sek Grr SPI iii SOT OL'T Attert 161 GOT 001 SOZ SLG Ios tëë Shë 962 | 08 
ty ot rot et PL'I LOT 88'E BEX AE SHT SLT Et SHT IS'T oer At LUZ E LOE SC 09'E 1002 

SI eT Gri SPI rr stet dot 401 GLE LT ST OST et ZOT TOS 20% pë tëë Seg 09% EE 
E or wt TL'E Act FST O6'T OOZ at BIT Er LET LET tt It OL'T at £6'Z org NGE 79'E at Ser POL 

gee OE EPL QFE GFE PPT 29T GOT 80° ELT OST SST OST FOL BOF BOS 80% Shë FG 96% 19% shë FE 008 [00 
sy EYE GOR PLT rt II eet £0'Z ZIT EI VET OP'T er ger EYE TL'T mr SST UIE PE'E S9'S tr ger 80'L 

Or EXE Pri SPE OSL 99T en SOT OLT SLT IST O8T GOT SOT GOT FOS orë LEE Seg LES ms 96 Se 00% | 09 
se wen TËT SEE LST Et 96°F 90'S SHT ETE SET Et SOT eet 997 et S8'Z 86'Z SIE LEE 89'E att ms Et 

Hä Oot E SI Get ër 1991 LI Statt Diet ST LOT mg SOG Te Shë Lë Shë os BLS LUE 207 | 99 
së TET OLE EI at Hat WE OLC SHE IT GEL at mt TIT OL'T at 88T TOE STE IE TL'E gt 90'S LIL 

>! ii Fl dot ër got OOF FLT BLT SBT get SOT SOT 20'S LOS Erg OFS 65% Orë 99% 64% ert EOF | OF 


(sonda uvour 10)V018 201 Mopars Jo 600180p 


(ponurjuoo) TIA TIAVL 


422 


TABLES 423 


TABLE VIII. Table of Chi Square” 


af. 50% 30% 20% 10% 5% 1% 
1 455 1.074 1.642 2.706 5 
2 1.386 2.408 3.219 4.605 5.901 Cen 
3 2.366 3.665 4.642 6.251 7.815 11.341 
4 3.357 4.878 5.989 7.779 9.488 13.277 
5 4.351 6.064 7.289 9.236 11.070 15.086 
6 5.348 7.231 8.558 10.645 12.592 16.812 
7 6.346 8.383 9.803 12.017 14,067 18.475 
8 7.344 9.524 11.030 13.362 15.507 20.090 
9 8.343 10.656 12.242 14.684 16.919 21.666 

10 9.342 11.781 13.442 15.987 18.307 23.209 

1 10.341 12.899 14.631 17.275 19.675 24.725 

12 11.340 14.011 15.812 18.549 21.026 26.217 

13 12.340 15.119 16.985 19.812 22.362 27.688 

14 13.339 16.222 18.151 21.064 23.685 29.141 

15 14.339 17.322 19.311 22.307 24.996 30.578 

16 15.338 18.418 20.465 23.542 26.296 32.000 

17 16.338 19.511 21.615 24.769 27.587 33.409 

18 17.338 20.601 22.760 25.989 28.869 34.805 

19 18.338 21.689 23.900 27.204 30.144 36.191 

20 19.337 22.775 25.038 28.412 31.410 37.566 

21 20.337 23.858 26.171 29.615 32.671 38.932 

22 21.337 24.939 27.301 30.813 33.924 40.289 

23 22.337 26.018 28.429 32.007 35.172 41.638 

24 23.337 27.096 29.553 33.196 36.415 42.980 

25 24.337 28.172 30.675 34.382 37.652 44.314 

26 25.336 29.246 31.795 35.563 38.885 45.642 

27 26.336 30.319 32.912 36.741 40.113 46.963 

28 27.336 31.391 34.027 37.916 41.337 48.278 

29 28.336 32.461 35.139 39.087 42.557 49.588 

30 29.336 33.530 36.250 40.256 43.773 50.892 


* Table VIII is abridged from Table IV of Fisher and Yates: Statistical Tables for Biological, Agricultural, 
anë Ger Research, published by Oliver and Boyd Limited, Edinburgh, by permission of authors and 
publishers. 


424 


APPENDIX B 


TABLE IX. Values of r at the 5 and 1 Per Cent Levels of Significance* 


5% 1% 5% 1% 

N r r N F r 
3 .997 .999 38 .320 413 
4 .950 .990 39 .316 .408 
5 .878 .959 40 .312 403 
6 S811 917 41 .308 .398 
7 .754 874 42 204 203 
8 207 834 43 201 290 
9 .666 .798 44 .297 .384 
10 .632 .765 45 204 -380 
11 .602 735 46 291 .376 
12 576 708 47 .288 872 
13 .553 684 AS 294 .368 
14 532 661 49 281 364 
15 .514 641 50 279 .361 
16 497 .623 55 .166 345 
17 .482 .606 60 254 E 
18 .468 590 65 244 317 
19 .456 575 70 235 200 
20 444 .561 75 .227 .296 
21 .433 .549 80 .220 .286 
22 .423 .537 85 213 .278 
23 413 .526 90 207 .270 
24 .404 .515 95 .202 .263 
25 .396 .505 100 .195 .256 
26 .388 .496 125 176 .230 
27 .381 487 150 159 210 
28 374 478 175 .148 194 
29 367 470 200 138 .181 
30 .361 463 300 113 148 
31 .355 456 400 .098 .128 
32 240 .449 500 .088 115 
33 344 ,442 600 .080 -105 
34 .339 .436 , 700 .074 .097 
35 .334 .430 800 .070 .091 
36 .329 424 900 .065 .086 
87 .325 .418 1000 .062 081 


“This table was constructed by Lyle D. Edmison from r = Ne 


N is the number of pairs used in computing r. 


A 


së 
$ 
j 
+ 


> —_ Se ea eae 


EE 


TABLES 425 
ann X. Values of Z Corresponding to Values of r 
l+r 
= 1 
[z 74 loge = ‘| 

r 0 1 2 3 4 5 6 7 8 9 
00 0000 .0O10 .0020 .0030 .0040 .0050 .0060 .0070 .0080 .0090 
01 0100 2010 .0120 .0130 .0140 .0150 .0160 [0170 .OIS0 ‘o190 
.02 .0200 .O210 .0220 .0230 .0240 .0250 .0260 .0270 geen ‘20 
.03 0300 .0310 .0320 .0330 .0340 .0350 .0360 .0370 geen 0300 
04 0400 200 .0420 Au .0440 0450-0460 10470 10480 10490 
05 0500 .0510 .0520 .0530 .054l .055l .0561 0571 ‘0581 0591 
0G 0601 .OG1L .0621 mm .0641 “0651 .0661 1.0671 .0681 .0691 
07 0701 mu .0721 0731 .0741 ma 0761 0762 mei “0792 
08 mmm .0812 .0822 .0832 0842 0852 .0862 0872 mes 080» 
09 0902 .0913 .0923 .0033 .0943 .0953 .0063 .0973 10983 10993 
A0 1003 .1018 .1024 .1034 .1044 1054 .1064 2001 11084 “1094 
A A0 Aë 1125 1185 A Am Ae 1 “1186 .1196 
.12 1206 .1216 .1226 .1236 .1246 .1257 .1267 .1277 11287 “1997 
18 Jam .1318 .1328 .1338 .1348 .1358 .1368 .1379 11389 "1399 
14 1409 .1419 .1430 .1440 1450 .1460 -1471 .1481 1149111501 
lë 1611 .1522 .1532 .1542 1552 1563 1573 11583 .1593 “1604 
1G 1614 .1624 .1634 .1645 .1655 .1665 .1676 .1686 .1696 -170 
A An 1727 .1737 .1748 .1758 .1768 .1779 1789 .1799 ` 809 
18.1820 .1830 jeu Jan .1861 1872 11882 Jam 11903 2 13 
19 1923 .1984 .1944 .1955 11965 .1975 1986 1996 _2007 2017 
.20 .2027 .2038 .2048 .2059 .2069 .2079 .2090 200 .2111 “2121 
21 .2132 .2142 .2158 2163 2174 2184 2195 .2205 2216 ` 22 
29 .2237 .2247 .2258 .2268 .2279 .2289 12300 .2310 .2321 “2831 
23 2842 .2352 2363 .2374 .2384 2305 .2405 2416 2427 2 
24 2048 .2458 .2469 .2480 .2490 .2501 .2512 .2522 .2583 2543 
25 2554 .2565 .2575 .2586 .2597 .2608 .2618 .2629 “2040 “2050 
.26 2001 .2672 .2683 .2603 2704 2715 .2726 2736 2787 2100 
27 2769 2779 2700 2801 2812 2823 2883 21 2855 2900 
28 .2877 .2888 .2899 2000 .2920 .2931 2942 3058 2064 2008 
29 12986 2007 .3008 .3018 .3029 2000 3051 3002 A 3084 
Sl "ee gai? em (8280 (3250 ee “gama e “Bape “3305 
ee A21 y i :3383 3395 2009 207 
“32.3316 mm mm 39503901 5372 38833305 3408 3417 
.88 3428 2440 .3451 .3462 .3473 .3485 . ; E 
34 3541 3652 3564 mem 2 3508 3009 3620 -3632 E 
Ap 2064 .3666 .3677 .3689 200 372 1 Te ee 
“JË Am .3780 .3792 .8803 3815 .3826 3838 .3850 .3801 .5878 
A Ae Ze 3907 o 3031 3042 305 3066 .3077 3080 
38 4001-4012 Ap 2 4047 4050 4071 4088 4094 4106 
do E d A ee es a a EN 
40 4256 AS 4260 4272 4284 4208 4908 4820 4902 4044 
d gaa eh e Ee Zog, "ee, “AL, eee 
42 ATT Am ADOL Am 4525 4558 „A660 4502 ATA Ze 
2 8 Ai aren een ARA dr em e Je 
d E E ANT AE O ed 4023 ue ows: eme 
Zi EAT A860, ARTDI E e, eer, 0048. e "ehre 
d E "+4980 4000) SOIT 05152 15165 5178 .5191 .5204 oz 
AT lor sua 5126 Zi ee e D308 ei e e 
oe oe ee "5413 .5427 .5440 .5453 .5466  .5480 
<> Ji at Aar e Kek 8600 e e 5000, © Said 
.50 15493 .5506 .5520 290 . 


This table was constructed by John P. Malloy. 


426 


APPENDIX B 


TABLE X (continued) 


2.6467 


5054 
-5791 
.5929 


.6213 
6358 
-6505 
-6655 
-6807 
-6963 
-7121 
-1283 
-7447 
-7616 
«7788 
-7964 
8144 
8328 
8518 
8712 
-8912 
9118 
-9330 
-9549 
-9775 
1.0010 
1.0253 
1.0505 
1.0768 
1.1042 
1.1329 
1.1630 
1.1946 
1.2280 
1.2634 
1.3011 
1.3414 
1.3847 
1.4316 
1.4828 
1.5393 
1.6022 
1.6734 
1.7555 
1.8527 
1.9721 
2.1273 
2.3507 
2.7587 


8 


5736 
-5874 
-6013 
-6155 
.6300 
6446 
6595 
0746 
6909 
7057 
.7218 
.7381 
.7548 
7718 
7893 
8071 
8254 
8441 
8634 
8832 
9035 
9245 
9461 
9684 
9915 
1.0154 
1.0403 
1.0661 
1.0931 
1.1212 
1.1507 
1.1817 
1.2144 
1.2490 
1.2857 
1.3249 
1.3670 
1.4124 
1.4618 
1.5160 
1.5762 
1.6438 
1.7211 
1.8117 
1.9210 
2.0595 
2.2494 
2.5550 
3.4534 


TABLES A27 


TABLE XI. Tetrachoric Correlation from the Phi Coefficient 
[r = sine (¢ 90°)] 


428 


APPENDIX B 


Tase XI (continued) 


2 


-7093 
7203 
7311 
EK) 
7522 


7624 
.1725 
.7824 
7921 
5016 


8109 
8199 
8288 
.8375 
.8460 


.8543 
8623 
.8702 
.8778 
.8852 


.8924 
.8994 
.9061 
9127 
9190 


9251 
-9309 
-9365 
9419 
9471 


9520 
9507 
9612 
9654 
9694 


9731 
9766 
9799 
9829 
9856 


9882 
9904 
9925 
9943 
9958 


9971 
9982 
9990 
9996 
9999 


3 4 5 6 vë 8 9 
7104 2118 .7126  .7137 7148  .7159 7170 
7214 .7225 .7236  .7247 .7257 .7268  .7279 
7322 .7333 .7343 .7354 .7364 .7375 7386 
7428 2428 .7449 .7459  .7470 .7480  .7491 
7532 2842 .7553 .7563  .7573 .7584 7594 
7635 .7645 .1655 .7665 7675  .7685 .7095 
7735 1795 1195 1165 1115 .7785  .7794 
"7834 .7844 .7853 .7863  .7873 .7882  .7892 
TI30 .7940 .7949 .7959 .7968 .7978 .7987 
.8025 .8034 .8044 .8053 .8062 .8072 .8081 
8118 .8127 .8136 .8145 8154 .8163 8172 
(8208 .8217 .8226 .8235 8244 .8253 .8262 
S297 .8306 .8315 8323 .8332 .8341 .8349 
(8384 .8392 .8401 8409 8418 .8426 .8435 
8468  .8477 .8485 .8493 .8502 .8510 .8518 
8551 .8559 .8567 .8575 .8583 .8591 8599 
“8631 .8639 .8647 .8655 .8663 8671 .8679 
.8710 .8717 .8725  .8733 .8740 .8748 .8755 
.8786 .8793 .8801 .8808 .8815 .8823  .8830 
.8860 .8867 .8874 .8881 .8888 .8896 .8903 
(8931 .8938 .8945 .8952 .8959 .8966  .8973 
9001 .9008 .9015 .9021 .9028 9035 .9042 
.9068 .9075 .9081 .9088 9094 .9101 .9107 
9133 .9140 .9146 .9152 .9159 9165 .9171 
9196 .9203 .9209 .9215 .9221 9227 .9233 
9257 .9263 .9269 .9274 .9280 .7286 .9292 
9315 .9321 .9326 .9332 9338 .9343 .9349 
"9371 .9376 .9382 .9387 .9393 .9398 .9403 
9425 .9430 .9485 .9440 9445 .9450 .9456 
9476 .9481 .9486 .9491 9496 .9501 .9506 
9525 .9530 .9535 9539 .9544 .9549 .9554 
.9572 .9576 .9581 9585 .9590 .9594 .9599 
.9616 .9620 .9625 9629 .9633 .9637 9641 
9658 .9662 .9666 9670 .9674 .9678 .9682 
“9698 9701 .9705 .9709 .9712 9716 .9720 
9735 9738 9742 .9745 9749 .9752 .9756 
“9769 19773 .9776 .9779 .9783 9786 .9789 
“9802 9805 .9808 .9811 .9814 .9817  .9820 
9332 .9834 .9837 .9840 “9843 .9846 .9848 
“9859 9862 .9864 .9867 .9870 .9872 .9874 
(9884 .9887 .9889 .9891 .9893 -9896 .9898 
“9907 90909 .9911 .9913 .9915 .9917 0010 
.9927 .9929 .9931 .9933 .9934 .9936 .9938 
"9945 9946 .9948 .9950 .9951 .9953 SE 
.9960 .9961 .9963 9964 .9965 .9967 996 
9973 NITA 9975 .9976 2077 207 ie 
.9983 9984 .9985 .9986 .9987 .9987 998 
.9991 9992 .9992 .9993 .9993 .9994 9995 
9997 9997 .9997 .9998 .9998 .9998 ,9999 
9999 9999 .9999 .9999 .9999 .9999 .9999 


TABLES 429 
- 
Tame XII. Correction Factors for Coarse Grouping 
NUMBER OF CATEGORIES 
o 
2 3 4 5 6 7 8 9 10 

01 1.253 1.112 1.068 1.047 1.037 1.030 1.025 1.022 1.019 
02 1.253 1.112 1.068 1.047 1.037 1.030 1.025 1.022 1.019 
03 1253 1.112 1.068 1.047 1.037 1.030 1.025 1.022 1.019 
04 1253 1.112 1.068 1.047 1.037 1.030 1.025 1.022 1.019 
05 1.253 1.112 1.068 1.047 1.037 1.030 1.025 1.022 1.019 
06 1252 1.111 1.068 1.047 1,037 1.030 1.025 1.022 1.019 
07 1.252 1111 1.068 1047 1.037 1030 1.025 1.022 1.019 
08 1.252 1.111 1.067 1.047 1037 1.030 1.025 1.022 1.019 
09 1251 LLI 1.067 1.047 1.037 1.030 1.025 1.022 1019 
10 1251 1.111 1.067 1.047 1.037 1.029 1.025 1.022 1.019 
Al 1250 1.111 1.067 1.047 1.037 1.029 1.025 1.022 1.019 
12 1.250 LHI 1.067 1.047 1.036 1.029 1.025 1.021 1.019 
.13 1249 1.110 1067 1047 1.036 1.029 1.025 1.021 1.019 
.14 1.248 1.110 1067 1.047 1.036 1.029 1.025 1.021 1.019 
Jä 1248 1.110 1.067 1047 1036 1.029 1.025 1,021 1.019 
16 1247 1109 1.066 1.046 1.036 1.029 1025 1.021 1.019 
.17 1246 1.109 1.066 1.046 1.036 1.029 1,024 1.021 1.019 
18 1245 1.108 1066 1.046 1036 1029 1.024 1.021 1019 
19 1244 1.108 1,066 1.046 1.036 1.029 1.024 1.021 1.019 
20 1243 1.108 1.065 1.046 1.035 1.029 1.024 1.021 1.018 
2l 1242 1107 1.065 1.046 1.035 1.029 1.024 1.021 1.018 
22 1,241 1.107 1065 1.045 1.035 1.028 1024 1.021 1.018 
238 1240 1106 1.065 1.045 1.035 1.028 1.024 1.021 1.018 
24 1239 1106 1.064 1.045 1.035 1.028 1.024 1.021 1.018 
25 1.237 1.105 1.064 1.045 1.035 1.028 1,024 1.020 1.018 
20 1.236 1105 1.064 1045 1034 1,028 1.024 1.020 1.018 
27 1234 1.104 1.063 1.044 1.034 1.028 1.023 1.020 1.018 
.28 1.233 1.103 1.063 1.044 1.034 1.028 1.023 1.020 1.018 
29 1.232 1.103 1.063 1.044 1.034 1.027 1.023 1.020 1.018 
20 1230 1.102 1.062 1.044 1.034 1.027 1.023 1.020 LOIS 
31 1228 1101 1.062 1.043 1.033 1,027 1.023 1.020 1.017 
82 1227 1.101 1.061 1.043 1.033 1.027 1.023 1.020 1.017 
33 1226 1100 1.061 1.043 1.033 1.027 1.023 1.020 1.017 
84 1224 1.100 1.060 1,042 1.033 1.027 1022 1020 1.017 
Sh 1222 1.099 1.060 1.042 1.032 1.026 1.022 1019 1.017 
36 1.220 1.098 1059 1.042 1.032 1.026 1.022 1.019 1017 
SI 1.218 1097 105 1.041 1.032 1.026 1.022 1.019 1.017 
38 1216 1.096 1.058 1.041 1.032 1026 1.022 1.019 1.017 
39 1214 1095 1.057 1041 1.031 1.025 1.022 1.019 1.016 
40 1.212 1095 1.057 1.040 1.031 1.025 1021 1.019 1.016 
AL 1210 1094 1.056 1.040 1.031 1.025 1.021 1.018 1.016 
42 1.208 1,093 1.055 1.040 1.031 1.025 1.021 1.018 1.016 
43 1.206 1.092 1.055 1.039 1.030 1.024 1.021 1.018 1.016 
44 1,204 1.091 1.054 1.039 1.030 1.024 1.021 1.018 1.016 
45 1201 1.090 1.054 1.039 1.030 1.024 1.020 1.018 1.016 
46 1199 1.089 1.053 1.038 1.029 1.024 1.020 1.017 1.015 
47 1197 1.088 1.053 1.038 1.029 1.023 1.020 1.017 1.015 
48 1.194 1.087 1.052 1.037 1.029 1.023 1.020 1.017 1.015 
49 1.192 1.086 1.051 1.037 1.028 1.023 1.020 1.017 1.015 
50 1.189 1.085 1.051 1.036 1.028 1.023 1.019 1.017 1015 


430 


APPENDIX B 


TABLE XII (continued) 


NUMBER OF CATEGORIES 


7 


1.022 


1.022 
1.022 
1.022 
1.021 


1.021 
1.021 
1.021 
1.020 
1.020 


1.020 
1.019 
1.019 
1.019 
1.018 


1.018 
1.018 
1.017 
1.017 
1.016 


1.016 
1.015 
1.015 
1.014 
1.014 


1.014 
1.013 
1.013 
1.012 
1.012 


1.011 
1.011 
1.010 
1.010 
1,009 


1.009 
1.008 
1.007 
1.007 
1.006 


1.006 
1.005 
1.004 
1.004 
1.003 


1.003 
1.002 
1.001 
1.001 
1.000 


1.019 
1.019 
1.018 
1.018 
1.018 


1.018 
1.017 
1.017 
1.017 
1.017 


1.016 
1.016 
1.016 
1015 
1.015 


1015 
1.014 
1.014 
1.014 
1.013 


1.013 
1.013 
1.012 
1.012 
1.012 


1.011 
1.011 
1.010 
1.010 
1.010 


1.009 
1.009 
1.008 
1.008 
1.008 


1.007 
1.007 
1.006 
1.006 
1.005 


1.005 
1.004 
1.004 
1.003 
1.003 


1.002 
1.002 
1.001 
1.001 
1.000 


Index 


Achievement tests, 319 
Adjustment for disproportionality 
analysis of variance, 213 
Adjustment of criterion means in analy- 
sis of covariance, 348, 361 
Analysis of covariance, 137, 343-363 
multiple classification, 352-362 
single classification, 344-363 
Analysis of linear regression, 232-249 
Analysis of nonlinear regression, 282-291 
Analysis of variance, 173-225 
applied to frequency distributions, 
179-183 
applied to two groups, 137-141 
assumptions underlying the analysis 
of variance, 183 
correction for disproportionality, 211- 
225 
equal frequency in independent 
groups, 173-177 
pairing of cases, 204-207 
relation of t to F for comparing two 
groups, 178-179 
testing significance between two means 
following analysis of variance, 183 
triple classification, 199-204 
unequal frequency in groups being 
compared, 177-178 
Aptitude tests, 319 
Area under the normal curve, 63-67 
Arithmetic mean, 23 
Array, 29 
Attenuation, correction for, 336-337 
Average, 22-23 
Average deviation, 53 


in 


Bias in sampling, 104-106 
Biased errors in measurement, 324-325 
Biserial correlation, 257-263 
assumptions underlying, 257-258 
computation, 258-263 


Centiles, 42, 46-48 
Central tendency, measures of, 22-37 
in nonrounded distributions, 31-32 


principles of interpretation, 32-37 
431 


Characteristics, continuum and noncon- 
tinuum, 1-2 
Cheshire, L., 303 
Chi square, 146-169 
difference between several correlation 
coefficients, 298 
direct solution of chi square, 
162 
four-cell contingency tables, 151-155 
goodness of fit, 166-169 
in estimation, 146-150 
in testing hypotheses, 150-151 
limitations in the application of chi 
square, 164-166 
multiple-cell contingency tables, 155- 
164 
relationship to phi coefficient, 302 
small-cell frequency, four-cell tables, 
154-155 
small-cell frequency, multiple-cell 
tables, 157-159 
table of chi square values, 423 
tables with more than thirty degrees 
of freedom, 162-164 
Class intervals 
assumptions of distribution of scores 
in class interval, 25, 29-30 
midpoint of, 17 
reported limits, 5 
size of, 4-5 
theoretical limits of, 17-18 
Classical theory of sampling, 92-101 
assumptions of, 94, 97 
characteristics of, 94-99 
importance of, 101 
Coarse grouping, 303-313 
Cochran, W. G., 109, 184 
Coefficient 
of contingency, 315 
of predictive effectiveness, 334 
of reliability, 330, 332-334 
of validity, 334-338 
of variation, 59-61 
Coefficients in nonvariable distributions, 
313-315 
Columns of a contingency table, 151 


159- 


at 
432 INDEX 


Compensating errors in measurement, 
324-325 
Compound units, 18-20 
Confidence limits, 127-128 
Contingency tables, 3-4, 151 
Continuous units, 15-18 
Correction factors for coarse grouping, 
429-430 
Correction for attenuation, 336-337 
Correlation chart, 9-10 
Correlation, coefficient of product-mo- 
ment, 74-89, 332 
computation of, 78-87 
interpretation of, 74-78 
techniques of analysis, 294-300 
Correlation, index of, 286 
Correlation, multiple, 240 
testing significance of, 240-249 
Correlation, partial, 248, 249-252 
testing significance of, 248 
Correlation, rank order, 87-89 
Correlated samples, 141-142 
Covariance, analysis of, 343-363 
multiple classification, 352-362 
single classification, 344-364 
Cox, G. M., 109, 184 
Critical ratio, 99 
Criterion, choice of, 125 
Criterion variable, 127, 213, 334-335 
Crossproducts, 81, 85 
Cumulative frequency distribution, 6, 9 


Davis, F. B., 340 
Deciles, 40-41 
computation of, 44-46 
Defining terms, 2-3 
Degrees of freedom 
analysis of covariance, 347 
analysis of linear regression, 232-236, 
238, 242 
analysis of nonlinear regression, 285 
analysis of variance, 138-141, 173-183, 
, 190, 198, 205, 215-222 
correlation analysis, 299 
chi square, 148, 155-156 
discriminant analysis, 266, 368 
estimating population parameters, 112 
two independent groups, 130-133, 135- 
137 
two paired groups, 141-142, 204-207 
Descriptive units, 2, 14-15 
quantifying descriptive units, 67-71 
Design of research studies, 127 
Diagrams, scatter, 76, 86 
Difference between two means, classical 
theory, 99 
correlated samples, 141-142 
pooled variance, 135-137 
separate group variance, 129-135 


Difference between two variances, 133- 
135 
Discrete units, 15-16 
Discriminate analysis, 257 
Discriminate equation, 263-270 
Discriminate function, 367 
Distributions, frequency, 4-13 
negatively skewed, 12 
normal, 12 
positively skewed, 13 
symmetrical, 33-34 
two-way, 9-10 
Doolittle solution, the abbreviated, 391 


Edmison, Lyle D., 424 
Equations, discriminant, 263-264, 364 
regression, 226 
solution of, 227, 285, 346, 389 
Errors in measurement, 327 
Estimates, 106 
Estimation, 103 


F, value of 
analysis of covariance, 347 
analysis of linear regression, 242 
analysis of nonlinear regression, 285 
analysis of variance, 176, 189, 196, 206, 
215 
correlation analysis, 299 
discriminant analysis, 266, 268 
meaning of, 134-135 
table of 1% and 5% values, 419-422 
Fiducial limits, 112-113, 116 
Fiducial probability, 113 
Finite population, 120-121 
Fisher, R. A., 93, 263, 294, 295, 297, 364, 
418, 423 
Flanagan, John C., 340 
Frequency distributions, 4-8 


Goodness of fit, 166-169 
Graphic presentation, 7-9 
Guessed mean, 25-26 
Guilford, J. P., 340 


Histogram, 8 
Homogeneity of variance, 184 
Hotelling, H., 299 
Hypotheses, 123-126 
criteria for testing, 125 
null hypothesis, 123 
statement of, 124 


Index of reliability, 331 
Interaction 
analysis of covariance, 335-362 
analysis of variance, 192-196 


Ga 
INDEX 


Interval limits of continuous units, 16- 
18 
Item analysis, 338-341 


Jaspen, Nathan, 272 
Johnson, P. O., 133, 184, 196 


Kuder, G. F., 334 


Least squares, 227 
Limits, fiducial, 112-113 
Limits of class intervals, 5, 17-18 
Linear regression, 227-252 
analysis of, 232-249 
multiple regression, 237-252 
single prediction variable, 226-237 


Malloy, John P., 425 
Mathematical models, 127 
Mean, arithmetic, 23 
as a measure of central tendency, 23 
computation of, 23-28 
principles of interpretation, 32-37 
standard error of, 96 
Mean deviation, 53 
Mean sigma distance, 69 
Measurement data, 1 
Measurement, statistical analysis in, 
318-341 
Measurement, in research, 318-320 
Measurement, errors in, 327 
Measurement, standard error of, 331 
Median, computation of, 29-31 
definition of, 28 
principles of interpretation, 32-37 
Mid-point of class intervals, 17 
Mid-score, 28 
Miscellaneous tests, 319 
Mode, 31 
Multicell correlation, 304 
Multiple regression, 237-249 
Multiple serial correlation, 275-277 
Multiple correlation (see Regression) 
coefficient, 240 
computed from zero-order coefficients, 
251-252 
testing significance of, 240-249 


Negative coefficient of correlation, 76- 
78 
Nonlinear regression, 282-291 
Nonsignificance, 127-129 
Normal curve, areas under, 63-67 
central tendency, 32-33 
equation for, 63 
in classical theory of sampling, 96-97 
tables of, 404-410 
variability, 63 
Normal equations, 227, 357 


433 
Null hypotheses, 123 
Numerical evidence, 1-2 
Numerical units, 13-14 
Objectivity of tests, 329 a 
Ogive, 9 
Ordinate, 9 
Ordinates and areas of the normal curve, 
404-410 


Parameter, 106 
Partial biserial correlation, 269-271 
Partial correlation, 248, 249-252 
Patterson, R. E., 216 
Pearson, Karl, 78, 303 
Pearson product moment coefficient of 
correlation, 74-89 
Percentages, computation of, 3-4 
Percentiles and percentile ranks, com- 
putation of, 42, 46-48 
Perfect measuring instrument, character- 
istics of, 320-331 
Peters, C. C., 304, 307 
Phi coefficient, 300-303 
Point biserial correlation, 314-315 
Population 
classical theory, 92 
statistical inference, 103 
types of populations, estimation, 106 
types of populations, testing hypothe- 
ses, 125 
Prediction variable, 213 
Probability, 99-100, 113 
Proof, standards required for, 128-129 


Quantifying descriptive units, 67-71 
Quartiles, computation of, 39-40, 44-46 
Quartile deviation, 51-53 


Random numbers, table of, 416-417 
use of table, 109 
Random sample, 94 
Range, 50-51 
of correlation coefficients, 76 
semi-interquartile, 51-53 
Rank order coefficient of correlation, 
87-89 
Regression coefficient, determination of, 
389-391 
Regression equation, 226 
Regression line, 226 
test for linearity, 282-291 
Regression, multiple, 237-249 
Reliability of tests, 328-331 
computing test reliability, 332-334 
Richardson, M. W., 334 
Residual sum of square, 229-231 
Residuals, 227 
Rows of a contingency table, 151 


434 


Safiir, M., 303 
Sample, 92 
Sampling distributions, 
295 
Sampling error, 106 
Sampling methods, 108-110 
cluster sampling, 108 
stratified sampling, 108 
Sampling statistic, 92 
Scatter diagram, 76, 86 
Semi-interquartile range, 51-53 
Serial correlation and discriminant anal- 
ysis 
assumptions underlying, 257-258 
computation, 259, 271-274, 312-313 
multiple serial correlations, 275-276 
serial correlation, generalized formula, 
271-275 
Sigma, 55 
Significance, tests of, 106 
Simultaneous equations, 226, 264, 389 
Skewness, 12-13 
Slaichert, William M., 303 
Snedecor, G. W., 93, 196, 418 
- Spearman, Charles, 78 
Spearman-Brown formula, 332 
Spearman rank order correlation, 78 
Squares, square roots, and reciprocals, 
table of, 394-403 
Standard deviation, 54, 56-59 
standard error of, 98 
Standard error of 
coefficient of correlation, 98 
differences between correlation coeffi- 
cients, 98 
mean difference, 99 
mean, random sample, 96 
mean, stratified sample, 119 
measurement, 331 
median, 98 
proportion, 98 
standard deviation, 98 
Z-function, 296 
Standard scores, 61-63 
mean of a distribution of, 62 
standard deviation of a distribution, 
62 
Statistic, 106 
Statistical control, 126 
Statistical hypothesis, 123-126 
Statistical inference 
estimation, 103-121 
Statistical methodology, 1 
testing hypotheses, 123-129 
Statistical significance, 127-129 
Statistical techniques in measurement, 
318-341 
Stratification, 126-127 
“Student,” 101 


96, 98, 111, 


INDEX 


Sum of squares 
analysis of covariance 
difference, 351 
interaction, 355 
residuals for within subgroups, 347 
total, 345 
within plus main effects, 357-360 
within subgroups, 345-346 
analysis of variance 
between pairs in correlated samples, 
141-142 
groups, 138-141, 175-178, 213 
interaction, 195-196, 213 
total, 138-141, 175-178 
within groups, 138-141, 175-178 
correlation chart, 86-87 
discriminant analysis, 367-369 
pooled variance, 135-137 
separate group variance, 130-135 
standard deviation of a distribution, 
57 
within sum of squares from stratified 
sample, 116-119 
regression 
linear regression, 232, 238 
linear residuals, 231, 238 
quadratic regression, 285-288 
quadratic residuals, 285-288 
total, 229-230, 238 
Summation, 24 


t-statistic 

biserial correlation, 260-261 

correlation in coarsely grouped distri- 
butions, 312-313 

covariance, 347 

estimation, 111-112 

linear regression, 236, 249-250 

pairing of cases, 204-207 

relation to chi square, 148 

relation to F, 178-179 

testing hypotheses, 130-142 

Tables of 

chi square, 423 

correction factors for coarse grouping, 
429-430 

F, 5 and 1 per cent values of, 419-422 

ordinates and areas of the normal 
curve (in terms of c-units from 
the mean), 404-405 

ordinates and o-units of the normal 
curve (in terms of p), 406-410 

Pa si 

— and — ,411-415 

2 Pq 

r at the 5 and 1 per cent levels of 
significance, 424 

random numbers, 416-417 

squares, square roots, and reciprocals, 
394-403 


INDEX E 435 


t, 418 
tetrachoric correlation from the phi 
coefficient, 427-428 
values of Z corresponding to values of 
T, 425-426 
Tally, 5 
Test of significance, 131 
Tests, classification of, 319-320 
Tetrachoric correlation, 303-304 
Thurstone, L. L., 303 
Tiedeman, David V., 364 


Universe, 92, 103 


Variable characteristic, 1 


Variance, 55 
Variance, pooled, 135-137 
Variance ratio, 134 


Wechsler, David, 19 
X-axis, 7 


Y-axis, 7 
Yates, Frank, 109, 154, 295, 418, 423 


Z-transformation of r, 295-300 
Z, values of corresponding to values of 
r, 425-426 


+ 


` D 
Pi A Oe 


Form No. 3. 
PSY, RES.L-1 


es best of Educational & Psychological +. 
Research Library. ` . 


r The book is to be returned W ithin æ 
the date stamped last. ° 


d E EE E SE 
E? e WW a ah "e EE 
d "E Fe: ` ee 
a PR pas a foo e" OTTO pjeses” 
z A al 313 GE eg E e. bag d 
SÉ Eier o ` Ce PA "ege, 
i GN. kk A i 
; EECH ap een 
birë, SC Ga — 
ZC "E 8 E 
E "Jee ch Ke ge 
Ka 
EE EC 
Ki 


